Predict Bike Sharing Demand with AutoGluon Template¶

Project: Predict Bike Sharing Demand with AutoGluon¶

This notebook is a template with each step that you need to complete for the project.

Please fill in your code where there are explicit ? markers in the notebook. You are welcome to add more cells and code as you see fit.

Once you have completed all the code implementations, please export your notebook as a HTML file so the reviews can view your code. Make sure you have all outputs correctly outputted.

File-> Export Notebook As... -> Export Notebook as HTML

There is a writeup to complete as well after all code implememtation is done. Please answer all questions and attach the necessary tables and charts. You can complete the writeup in either markdown or PDF.

Completing the code template and writeup template will cover all of the rubric points for this project.

The rubric contains "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. The stand out suggestions are optional. If you decide to pursue the "stand out suggestions", you can include the code in this notebook and also discuss the results in the writeup file.

Step 1: Create an account with Kaggle¶

Create Kaggle Account and download API key¶

Below is example of steps to get the API username and key. Each student will have their own username and key.

  1. Open account settings. kaggle1.png kaggle2.png
  2. Scroll down to API and click Create New API Token. kaggle3.png kaggle4.png
  3. Open up kaggle.json and use the username and key. kaggle5.png

Step 2: Download the Kaggle dataset using the kaggle python library¶

Open up Sagemaker Studio and use starter template¶

  1. Notebook should be using a ml.t3.medium instance (2 vCPU + 4 GiB)
  2. Notebook should be using kernal: Python 3 (MXNet 1.8 Python 3.7 CPU Optimized)

Install packages¶

In [2]:
!pip install -U pip
!pip install -U setuptools wheel
!pip install -U "mxnet<2.0.0" bokeh==2.0.1
!pip install autogluon --no-cache-dir
# Without --no-cache-dir, smaller aws instances may have trouble installing
Requirement already satisfied: pip in /usr/local/lib/python3.8/dist-packages (21.3.1)
Collecting pip
  Using cached pip-22.3.1-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.3.1
    Uninstalling pip-21.3.1:
      Successfully uninstalled pip-21.3.1
Successfully installed pip-22.3.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (59.3.0)
Collecting setuptools
  Using cached setuptools-66.1.1-py3-none-any.whl (1.3 MB)
Requirement already satisfied: wheel in /usr/lib/python3/dist-packages (0.34.2)
Collecting wheel
  Using cached wheel-0.38.4-py3-none-any.whl (36 kB)
Installing collected packages: wheel, setuptools
  Attempting uninstall: wheel
    Found existing installation: wheel 0.34.2
    Uninstalling wheel-0.34.2:
      Successfully uninstalled wheel-0.34.2
  Attempting uninstall: setuptools
    Found existing installation: setuptools 59.3.0
    Uninstalling setuptools-59.3.0:
      Successfully uninstalled setuptools-59.3.0
Successfully installed setuptools-66.1.1 wheel-0.38.4
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Collecting mxnet<2.0.0
  Using cached mxnet-1.9.1-py3-none-manylinux2014_x86_64.whl (49.1 MB)
Collecting bokeh==2.0.1
  Using cached bokeh-2.0.1-py3-none-any.whl
Requirement already satisfied: typing-extensions>=3.7.4 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (4.0.1)
Requirement already satisfied: PyYAML>=3.10 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (5.4.1)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (2.8.2)
Requirement already satisfied: packaging>=16.8 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (21.3)
Requirement already satisfied: numpy>=1.11.3 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (1.19.1)
Requirement already satisfied: tornado>=5 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (6.1)
Requirement already satisfied: pillow>=4.0 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (9.0.0)
Requirement already satisfied: Jinja2>=2.7 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (3.0.3)
Requirement already satisfied: graphviz<0.9.0,>=0.8.1 in /usr/local/lib/python3.8/dist-packages (from mxnet<2.0.0) (0.8.4)
Requirement already satisfied: requests<3,>=2.20.0 in /usr/local/lib/python3.8/dist-packages (from mxnet<2.0.0) (2.27.1)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from Jinja2>=2.7->bokeh==2.0.1) (2.0.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging>=16.8->bokeh==2.0.1) (3.0.7)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.1->bokeh==2.0.1) (1.16.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (1.26.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.8/dist-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (2.0.10)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (3.3)
Installing collected packages: mxnet, bokeh
  Attempting uninstall: bokeh
    Found existing installation: bokeh 2.4.2
    Uninstalling bokeh-2.4.2:
      Successfully uninstalled bokeh-2.4.2
Successfully installed bokeh-2.0.1 mxnet-1.9.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Collecting autogluon
  Downloading autogluon-0.6.2-py3-none-any.whl (9.8 kB)
Collecting autogluon.timeseries[all]==0.6.2
  Downloading autogluon.timeseries-0.6.2-py3-none-any.whl (103 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.6/103.6 kB 230.5 MB/s eta 0:00:00
Collecting autogluon.features==0.6.2
  Downloading autogluon.features-0.6.2-py3-none-any.whl (60 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.0/60.0 kB 230.6 MB/s eta 0:00:00
Collecting autogluon.multimodal==0.6.2
  Downloading autogluon.multimodal-0.6.2-py3-none-any.whl (303 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 303.4/303.4 kB 246.6 MB/s eta 0:00:00
Collecting autogluon.text==0.6.2
  Downloading autogluon.text-0.6.2-py3-none-any.whl (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.1/62.1 kB 206.2 MB/s eta 0:00:00
Collecting autogluon.tabular[all]==0.6.2
  Downloading autogluon.tabular-0.6.2-py3-none-any.whl (292 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.5/292.5 kB 327.1 MB/s eta 0:00:00
Collecting autogluon.core[all]==0.6.2
  Downloading autogluon.core-0.6.2-py3-none-any.whl (226 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 226.5/226.5 kB 322.9 MB/s eta 0:00:00
Collecting autogluon.vision==0.6.2
  Downloading autogluon.vision-0.6.2-py3-none-any.whl (49 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.8/49.8 kB 202.3 MB/s eta 0:00:00
Requirement already satisfied: tqdm>=4.38.0 in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (4.39.0)
Requirement already satisfied: scipy<1.10.0,>=1.5.4 in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (1.7.0)
Collecting numpy<1.24,>=1.21
  Downloading numpy-1.23.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 264.2 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: psutil<6,>=5.7.3 in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (5.9.0)
Collecting autogluon.common==0.6.2
  Downloading autogluon.common-0.6.2-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.7/44.7 kB 191.4 MB/s eta 0:00:00
Collecting dask<=2021.11.2,>=2021.09.1
  Downloading dask-2021.11.2-py3-none-any.whl (1.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 345.8 MB/s eta 0:00:00
Collecting networkx<3.0,>=2.3
  Downloading networkx-2.8.8-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 216.1 MB/s eta 0:00:00
Requirement already satisfied: pandas!=1.4.0,<1.6,>=1.2.5 in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (1.3.0)
Collecting distributed<=2021.11.2,>=2021.09.1
  Downloading distributed-2021.11.2-py3-none-any.whl (802 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 802.2/802.2 kB 239.4 MB/s eta 0:00:00
Requirement already satisfied: boto3 in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (1.20.42)
Requirement already satisfied: scikit-learn<1.2,>=1.0.0 in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (1.0.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (3.5.1)
Requirement already satisfied: requests in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (2.27.1)
Collecting hyperopt<0.2.8,>=0.2.7
  Downloading hyperopt-0.2.7-py2.py3-none-any.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 118.5 MB/s eta 0:00:00
Collecting ray<2.1,>=2.0
  Downloading ray-2.0.1-cp38-cp38-manylinux2014_x86_64.whl (60.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.2/60.2 MB 172.0 MB/s eta 0:00:0000:0100:01
Collecting torchmetrics<0.9.0,>=0.8.0
  Downloading torchmetrics-0.8.2-py3-none-any.whl (409 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 409.8/409.8 kB 308.1 MB/s eta 0:00:00
Collecting jsonschema<=4.8.0
  Downloading jsonschema-4.8.0-py3-none-any.whl (81 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81.4/81.4 kB 239.8 MB/s eta 0:00:00
Collecting timm<0.7.0
  Downloading timm-0.6.12-py3-none-any.whl (549 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 549.1/549.1 kB 340.3 MB/s eta 0:00:00
Collecting openmim<=0.2.1,>0.1.5
  Downloading openmim-0.2.1-py2.py3-none-any.whl (49 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.7/49.7 kB 177.1 MB/s eta 0:00:00
Collecting accelerate<0.14,>=0.9
  Downloading accelerate-0.13.2-py3-none-any.whl (148 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 148.8/148.8 kB 297.4 MB/s eta 0:00:00
Collecting Pillow<=9.4.0,>=9.3.0
  Downloading Pillow-9.4.0-cp38-cp38-manylinux_2_28_x86_64.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 107.3 MB/s eta 0:00:00
Collecting seqeval<=1.2.2
  Downloading seqeval-1.2.2.tar.gz (43 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.6/43.6 kB 185.5 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting pytorch-lightning<1.8.0,>=1.7.4
  Downloading pytorch_lightning-1.7.7-py3-none-any.whl (708 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 708.1/708.1 kB 308.8 MB/s eta 0:00:00
Collecting fairscale<=0.4.6,>=0.4.5
  Downloading fairscale-0.4.6.tar.gz (248 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 248.2/248.2 kB 324.1 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting evaluate<=0.3.0
  Downloading evaluate-0.3.0-py3-none-any.whl (72 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 72.9/72.9 kB 236.6 MB/s eta 0:00:00
Collecting scikit-image<0.20.0,>=0.19.1
  Downloading scikit_image-0.19.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.0/14.0 MB 222.0 MB/s eta 0:00:00a 0:00:01
Collecting transformers<4.24.0,>=4.23.0
  Downloading transformers-4.23.1-py3-none-any.whl (5.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.3/5.3 MB 246.1 MB/s eta 0:00:00
Collecting torchtext<0.14.0
  Downloading torchtext-0.13.1-cp38-cp38-manylinux1_x86_64.whl (1.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 311.2 MB/s eta 0:00:00
Collecting nptyping<1.5.0,>=1.4.4
  Downloading nptyping-1.4.4-py3-none-any.whl (31 kB)
Collecting text-unidecode<=1.3
  Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.2/78.2 kB 248.5 MB/s eta 0:00:00
Collecting pytorch-metric-learning<1.4.0,>=1.3.0
  Downloading pytorch_metric_learning-1.3.2-py3-none-any.whl (109 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.4/109.4 kB 285.9 MB/s eta 0:00:00
Collecting albumentations<=1.2.0,>=1.1.0
  Downloading albumentations-1.2.0-py3-none-any.whl (113 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 113.5/113.5 kB 9.9 MB/s eta 0:00:00
Collecting torch<1.13,>=1.9
  Downloading torch-1.12.1-cp38-cp38-manylinux1_x86_64.whl (776.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 776.3/776.3 MB 209.7 MB/s eta 0:00:0000:0100:01
Collecting torchvision<0.14.0
  Downloading torchvision-0.13.1-cp38-cp38-manylinux1_x86_64.whl (19.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.1/19.1 MB 217.5 MB/s eta 0:00:00a 0:00:01
Collecting nlpaug<=1.1.10,>=1.1.10
  Downloading nlpaug-1.1.10-py3-none-any.whl (410 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.8/410.8 kB 326.5 MB/s eta 0:00:00
Collecting smart-open<5.3.0,>=5.2.1
  Downloading smart_open-5.2.1-py3-none-any.whl (58 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.6/58.6 kB 219.2 MB/s eta 0:00:00
Collecting sentencepiece<0.2.0,>=0.1.95
  Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 353.5 MB/s eta 0:00:00
Collecting omegaconf<2.2.0,>=2.1.1
  Downloading omegaconf-2.1.2-py3-none-any.whl (74 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 74.7/74.7 kB 249.7 MB/s eta 0:00:00
Collecting defusedxml<=0.7.1,>=0.7.1
  Downloading defusedxml-0.7.1-py2.py3-none-any.whl (25 kB)
Collecting nltk<4.0.0,>=3.4.5
  Downloading nltk-3.8.1-py3-none-any.whl (1.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 356.7 MB/s eta 0:00:00
Collecting fastai<2.8,>=2.3.1
  Downloading fastai-2.7.10-py3-none-any.whl (240 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 240.9/240.9 kB 315.8 MB/s eta 0:00:00
Collecting xgboost<1.8,>=1.6
  Downloading xgboost-1.7.3-py3-none-manylinux2014_x86_64.whl (193.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.6/193.6 MB 239.4 MB/s eta 0:00:00a 0:00:01
Collecting catboost<1.2,>=1.0
  Downloading catboost-1.1.1-cp38-none-manylinux1_x86_64.whl (76.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76.6/76.6 MB 237.9 MB/s eta 0:00:00a 0:00:01
Collecting lightgbm<3.4,>=3.3
  Downloading lightgbm-3.3.4-py3-none-manylinux1_x86_64.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 338.7 MB/s eta 0:00:00
Collecting statsmodels~=0.13.0
  Downloading statsmodels-0.13.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.9/9.9 MB 213.8 MB/s eta 0:00:00a 0:00:01
Collecting gluonts~=0.11.0
  Downloading gluonts-0.11.8-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 258.8 MB/s eta 0:00:00
Requirement already satisfied: joblib~=1.1 in /usr/local/lib/python3.8/dist-packages (from autogluon.timeseries[all]==0.6.2->autogluon) (1.1.0)
Collecting tbats~=1.1
  Downloading tbats-1.1.2-py3-none-any.whl (43 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.8/43.8 kB 188.6 MB/s eta 0:00:00
Collecting pmdarima~=1.8.2
  Downloading pmdarima-1.8.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 350.1 MB/s eta 0:00:00
Collecting sktime<0.14,>=0.13.1
  Downloading sktime-0.13.4-py3-none-any.whl (7.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 96.3 MB/s eta 0:00:00a 0:00:01
Collecting gluoncv<0.10.6,>=0.10.5
  Downloading gluoncv-0.10.5.post0-py2.py3-none-any.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 347.2 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from autogluon.common==0.6.2->autogluon.core[all]==0.6.2->autogluon) (66.1.1)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.8/dist-packages (from accelerate<0.14,>=0.9->autogluon.multimodal==0.6.2->autogluon) (21.3)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.8/dist-packages (from accelerate<0.14,>=0.9->autogluon.multimodal==0.6.2->autogluon) (5.4.1)
Collecting opencv-python-headless>=4.1.1
  Downloading opencv_python_headless-4.7.0.68-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (49.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.2/49.2 MB 184.7 MB/s eta 0:00:00a 0:00:01
Collecting qudida>=0.0.4
  Downloading qudida-0.0.4-py3-none-any.whl (3.5 kB)
Collecting albumentations<=1.2.0,>=1.1.0
  Downloading albumentations-1.1.0-py3-none-any.whl (102 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 102.4/102.4 kB 265.2 MB/s eta 0:00:00
Requirement already satisfied: plotly in /usr/local/lib/python3.8/dist-packages (from catboost<1.2,>=1.0->autogluon.tabular[all]==0.6.2->autogluon) (5.5.0)
Requirement already satisfied: graphviz in /usr/local/lib/python3.8/dist-packages (from catboost<1.2,>=1.0->autogluon.tabular[all]==0.6.2->autogluon) (0.8.4)
Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from catboost<1.2,>=1.0->autogluon.tabular[all]==0.6.2->autogluon) (1.16.0)
Collecting partd>=0.3.10
  Downloading partd-1.3.0-py3-none-any.whl (18 kB)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.2->autogluon) (2022.1.0)
Collecting toolz>=0.8.2
  Downloading toolz-0.12.0-py3-none-any.whl (55 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 55.8/55.8 kB 214.2 MB/s eta 0:00:00
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.2->autogluon) (2.0.0)
Collecting click>=6.6
  Downloading click-8.1.3-py3-none-any.whl (96 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.6/96.6 kB 181.1 MB/s eta 0:00:00
Collecting sortedcontainers!=2.0.0,!=2.0.1
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Collecting msgpack>=0.6.0
  Downloading msgpack-1.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (322 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 322.5/322.5 kB 308.7 MB/s eta 0:00:00
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.2->autogluon) (6.1)
Collecting zict>=0.1.3
  Downloading zict-2.2.0-py2.py3-none-any.whl (23 kB)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.2->autogluon) (3.0.3)
Collecting tblib>=1.6.0
  Downloading tblib-1.7.0-py2.py3-none-any.whl (12 kB)
Collecting responses<0.19
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Requirement already satisfied: multiprocess in /usr/local/lib/python3.8/dist-packages (from evaluate<=0.3.0->autogluon.multimodal==0.6.2->autogluon) (0.70.12.2)
Requirement already satisfied: dill in /usr/local/lib/python3.8/dist-packages (from evaluate<=0.3.0->autogluon.multimodal==0.6.2->autogluon) (0.3.4)
Collecting tqdm>=4.38.0
  Downloading tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 223.8 MB/s eta 0:00:00
Collecting xxhash
  Downloading xxhash-3.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (213 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 213.0/213.0 kB 309.6 MB/s eta 0:00:00
Collecting datasets>=2.0.0
  Downloading datasets-2.8.0-py3-none-any.whl (452 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 452.9/452.9 kB 342.9 MB/s eta 0:00:00
Collecting huggingface-hub>=0.7.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 182.4/182.4 kB 309.1 MB/s eta 0:00:00
Collecting fastprogress>=0.2.4
  Downloading fastprogress-1.0.3-py3-none-any.whl (12 kB)
Collecting fastcore<1.6,>=1.4.5
  Downloading fastcore-1.5.27-py3-none-any.whl (67 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.1/67.1 kB 234.8 MB/s eta 0:00:00
Collecting fastdownload<2,>=0.0.5
  Downloading fastdownload-0.0.7-py3-none-any.whl (12 kB)
Requirement already satisfied: pip in /usr/local/lib/python3.8/dist-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.6.2->autogluon) (22.3.1)
Collecting spacy<4
  Downloading spacy-3.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.7/6.7 MB 251.0 MB/s eta 0:00:00
Collecting autocfg
  Downloading autocfg-0.0.8-py3-none-any.whl (13 kB)
Collecting yacs
  Downloading yacs-0.1.8-py3-none-any.whl (14 kB)
Requirement already satisfied: portalocker in /usr/local/lib/python3.8/dist-packages (from gluoncv<0.10.6,>=0.10.5->autogluon.vision==0.6.2->autogluon) (2.3.2)
Requirement already satisfied: opencv-python in /usr/local/lib/python3.8/dist-packages (from gluoncv<0.10.6,>=0.10.5->autogluon.vision==0.6.2->autogluon) (4.5.5.62)
Collecting pydantic~=1.7
  Downloading pydantic-1.10.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 309.0 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.8/dist-packages (from gluonts~=0.11.0->autogluon.timeseries[all]==0.6.2->autogluon) (4.0.1)
Collecting future
  Downloading future-0.18.3.tar.gz (840 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 840.9/840.9 kB 361.1 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting py4j
  Downloading py4j-0.10.9.7-py2.py3-none-any.whl (200 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.5/200.5 kB 176.3 MB/s eta 0:00:00
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema<=4.8.0->autogluon.multimodal==0.6.2->autogluon) (21.4.0)
Collecting importlib-resources>=1.4.0
  Downloading importlib_resources-5.10.2-py3-none-any.whl (34 kB)
Collecting pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0
  Downloading pyrsistent-0.19.3-py3-none-any.whl (57 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.5/57.5 kB 197.7 MB/s eta 0:00:00
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (from lightgbm<3.4,>=3.3->autogluon.tabular[all]==0.6.2->autogluon) (0.38.4)
Collecting regex>=2021.8.3
  Downloading regex-2022.10.31-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (772 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 772.3/772.3 kB 360.3 MB/s eta 0:00:00
Collecting typish>=1.7.0
  Downloading typish-1.9.3-py3-none-any.whl (45 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.1/45.1 kB 175.1 MB/s eta 0:00:00
Collecting antlr4-python3-runtime==4.8
  Downloading antlr4-python3-runtime-4.8.tar.gz (112 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 112.4/112.4 kB 295.9 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting rich
  Downloading rich-13.2.0-py3-none-any.whl (238 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 238.9/238.9 kB 327.9 MB/s eta 0:00:00
Requirement already satisfied: tabulate in /usr/local/lib/python3.8/dist-packages (from openmim<=0.2.1,>0.1.5->autogluon.multimodal==0.6.2->autogluon) (0.8.9)
Collecting model-index
  Downloading model_index-0.1.11-py3-none-any.whl (34 kB)
Requirement already satisfied: colorama in /usr/local/lib/python3.8/dist-packages (from openmim<=0.2.1,>0.1.5->autogluon.multimodal==0.6.2->autogluon) (0.4.3)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas!=1.4.0,<1.6,>=1.2.5->autogluon.core[all]==0.6.2->autogluon) (2021.3)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas!=1.4.0,<1.6,>=1.2.5->autogluon.core[all]==0.6.2->autogluon) (2.8.2)
Requirement already satisfied: Cython!=0.29.18,>=0.29 in /usr/local/lib/python3.8/dist-packages (from pmdarima~=1.8.2->autogluon.timeseries[all]==0.6.2->autogluon) (0.29.26)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.8/dist-packages (from pmdarima~=1.8.2->autogluon.timeseries[all]==0.6.2->autogluon) (1.26.8)
Collecting pyDeprecate>=0.3.1
  Downloading pyDeprecate-0.3.2-py3-none-any.whl (10 kB)
Collecting tensorboard>=2.9.1
  Downloading tensorboard-2.11.2-py3-none-any.whl (6.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.0/6.0 MB 241.3 MB/s eta 0:00:00
Requirement already satisfied: protobuf<4.0.0,>=3.15.3 in /usr/local/lib/python3.8/dist-packages (from ray<2.1,>=2.0->autogluon.core[all]==0.6.2->autogluon) (3.19.3)
Collecting filelock
  Downloading filelock-3.9.0-py3-none-any.whl (9.7 kB)
Collecting grpcio<=1.43.0,>=1.32.0
  Downloading grpcio-1.43.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.1/4.1 MB 260.6 MB/s eta 0:00:00
Collecting frozenlist
  Downloading frozenlist-1.3.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (161 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.3/161.3 kB 290.5 MB/s eta 0:00:00
Collecting aiosignal
  Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting virtualenv
  Downloading virtualenv-20.17.1-py3-none-any.whl (8.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.8/8.8 MB 216.3 MB/s eta 0:00:00a 0:00:01
Collecting click>=6.6
  Downloading click-8.0.4-py3-none-any.whl (97 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.5/97.5 kB 261.3 MB/s eta 0:00:00
Collecting tensorboardX>=1.9
  Downloading tensorboardX-2.5.1-py2.py3-none-any.whl (125 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.4/125.4 kB 255.9 MB/s eta 0:00:00
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests->autogluon.core[all]==0.6.2->autogluon) (3.3)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.8/dist-packages (from requests->autogluon.core[all]==0.6.2->autogluon) (2.0.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests->autogluon.core[all]==0.6.2->autogluon) (2021.10.8)
Collecting PyWavelets>=1.1.1
  Downloading PyWavelets-1.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.9/6.9 MB 263.5 MB/s eta 0:00:00
Collecting tifffile>=2019.7.26
  Downloading tifffile-2023.1.23.1-py3-none-any.whl (214 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 214.8/214.8 kB 286.4 MB/s eta 0:00:00
Requirement already satisfied: imageio>=2.4.1 in /usr/local/lib/python3.8/dist-packages (from scikit-image<0.20.0,>=0.19.1->autogluon.multimodal==0.6.2->autogluon) (2.14.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from scikit-learn<1.2,>=1.0.0->autogluon.core[all]==0.6.2->autogluon) (3.0.0)
Collecting numpy<1.24,>=1.21
  Downloading numpy-1.22.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.9/16.9 MB 172.6 MB/s eta 0:00:00a 0:00:01
Collecting deprecated>=1.2.13
  Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Requirement already satisfied: numba>=0.53 in /usr/local/lib/python3.8/dist-packages (from sktime<0.14,>=0.13.1->autogluon.timeseries[all]==0.6.2->autogluon) (0.55.0)
Collecting patsy>=0.5.2
  Downloading patsy-0.5.3-py2.py3-none-any.whl (233 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 233.8/233.8 kB 290.4 MB/s eta 0:00:00
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.6/7.6 MB 274.3 MB/s eta 0:00:00
Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /usr/local/lib/python3.8/dist-packages (from boto3->autogluon.core[all]==0.6.2->autogluon) (0.5.0)
Requirement already satisfied: botocore<1.24.0,>=1.23.42 in /usr/local/lib/python3.8/dist-packages (from boto3->autogluon.core[all]==0.6.2->autogluon) (1.23.42)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.8/dist-packages (from boto3->autogluon.core[all]==0.6.2->autogluon) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->autogluon.core[all]==0.6.2->autogluon) (1.3.2)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.8/dist-packages (from matplotlib->autogluon.core[all]==0.6.2->autogluon) (4.29.0)
Requirement already satisfied: pyparsing>=2.2.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->autogluon.core[all]==0.6.2->autogluon) (3.0.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/dist-packages (from matplotlib->autogluon.core[all]==0.6.2->autogluon) (0.11.0)
Collecting aiohttp
  Downloading aiohttp-3.8.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 348.0 MB/s eta 0:00:00
Requirement already satisfied: pyarrow>=6.0.0 in /usr/local/lib/python3.8/dist-packages (from datasets>=2.0.0->evaluate<=0.3.0->autogluon.multimodal==0.6.2->autogluon) (6.0.1)
Collecting wrapt<2,>=1.10
  Downloading wrapt-1.14.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (81 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81.0/81.0 kB 218.8 MB/s eta 0:00:00
Requirement already satisfied: zipp>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0->jsonschema<=4.8.0->autogluon.multimodal==0.6.2->autogluon) (3.7.0)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53->sktime<0.14,>=0.13.1->autogluon.timeseries[all]==0.6.2->autogluon) (0.38.0)
Collecting numpy<1.24,>=1.21
  Downloading numpy-1.21.6-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.7/15.7 MB 176.4 MB/s eta 0:00:00a 0:00:01
Collecting locket
  Downloading locket-1.0.0-py2.py3-none-any.whl (4.4 kB)
Collecting typing-extensions~=4.0
  Downloading typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting thinc<8.2.0,>=8.1.0
  Downloading thinc-8.1.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (828 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 828.9/828.9 kB 350.7 MB/s eta 0:00:00
Collecting srsly<3.0.0,>=2.4.3
  Downloading srsly-2.4.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (492 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 492.6/492.6 kB 327.6 MB/s eta 0:00:00
Collecting langcodes<4.0.0,>=3.2.0
  Downloading langcodes-3.3.0-py3-none-any.whl (181 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 181.6/181.6 kB 271.5 MB/s eta 0:00:00
Collecting spacy-loggers<2.0.0,>=1.0.0
  Downloading spacy_loggers-1.0.4-py3-none-any.whl (11 kB)
Collecting cymem<2.1.0,>=2.0.2
  Downloading cymem-2.0.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36 kB)
Collecting typer<0.8.0,>=0.3.0
  Downloading typer-0.7.0-py3-none-any.whl (38 kB)
Collecting murmurhash<1.1.0,>=0.28.0
  Downloading murmurhash-1.0.9-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21 kB)
Collecting preshed<3.1.0,>=3.0.2
  Downloading preshed-3.0.8-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (130 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 130.8/130.8 kB 294.1 MB/s eta 0:00:00
Collecting spacy-legacy<3.1.0,>=3.0.11
  Downloading spacy_legacy-3.0.12-py2.py3-none-any.whl (29 kB)
Collecting pathy>=0.10.0
  Downloading pathy-0.10.1-py3-none-any.whl (48 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.9/48.9 kB 231.6 MB/s eta 0:00:00
Collecting wasabi<1.2.0,>=0.9.1
  Downloading wasabi-1.1.1-py3-none-any.whl (27 kB)
Collecting catalogue<2.1.0,>=2.0.6
  Downloading catalogue-2.0.8-py3-none-any.whl (17 kB)
Collecting tensorboard-data-server<0.7.0,>=0.6.0
  Downloading tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 274.6 MB/s eta 0:00:00
Collecting markdown>=2.6.8
  Downloading Markdown-3.4.1-py3-none-any.whl (93 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 93.3/93.3 kB 250.5 MB/s eta 0:00:00
Collecting google-auth<3,>=1.6.3
  Downloading google_auth-2.16.0-py2.py3-none-any.whl (177 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.8/177.8 kB 303.7 MB/s eta 0:00:00
Collecting absl-py>=0.4
  Downloading absl_py-1.4.0-py3-none-any.whl (126 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.5/126.5 kB 287.5 MB/s eta 0:00:00
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from tensorboard>=2.9.1->pytorch-lightning<1.8.0,>=1.7.4->autogluon.multimodal==0.6.2->autogluon) (2.0.2)
Collecting google-auth-oauthlib<0.5,>=0.4.1
  Downloading google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB)
Collecting tensorboard-plugin-wit>=1.6.0
  Downloading tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 781.3/781.3 kB 357.0 MB/s eta 0:00:00
Collecting heapdict
  Downloading HeapDict-1.0.1-py3-none-any.whl (3.9 kB)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.2->autogluon) (2.0.1)
Collecting ordered-set
  Downloading ordered_set-4.1.0-py3-none-any.whl (7.6 kB)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.8/dist-packages (from plotly->catboost<1.2,>=1.0->autogluon.tabular[all]==0.6.2->autogluon) (8.0.1)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.8/dist-packages (from rich->openmim<=0.2.1,>0.1.5->autogluon.multimodal==0.6.2->autogluon) (2.14.0)
Collecting markdown-it-py<3.0.0,>=2.1.0
  Downloading markdown_it_py-2.1.0-py3-none-any.whl (84 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84.5/84.5 kB 245.6 MB/s eta 0:00:00
Collecting platformdirs<3,>=2.4
  Downloading platformdirs-2.6.2-py3-none-any.whl (14 kB)
Collecting distlib<1,>=0.3.6
  Downloading distlib-0.3.6-py2.py3-none-any.whl (468 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.5/468.5 kB 351.2 MB/s eta 0:00:00
Collecting pyasn1-modules>=0.2.1
  Downloading pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.3/155.3 kB 192.3 MB/s eta 0:00:00
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.8/dist-packages (from google-auth<3,>=1.6.3->tensorboard>=2.9.1->pytorch-lightning<1.8.0,>=1.7.4->autogluon.multimodal==0.6.2->autogluon) (4.7.2)
Collecting cachetools<6.0,>=2.0.0
  Downloading cachetools-5.3.0-py3-none-any.whl (9.3 kB)
Collecting requests-oauthlib>=0.7.0
  Downloading requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Requirement already satisfied: importlib-metadata>=4.4 in /usr/local/lib/python3.8/dist-packages (from markdown>=2.6.8->tensorboard>=2.9.1->pytorch-lightning<1.8.0,>=1.7.4->autogluon.multimodal==0.6.2->autogluon) (4.10.1)
Collecting mdurl~=0.1
  Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Collecting confection<1.0.0,>=0.0.1
  Downloading confection-0.0.4-py3-none-any.whl (32 kB)
Collecting blis<0.8.0,>=0.7.8
  Downloading blis-0.7.9-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 228.5 MB/s eta 0:00:00a 0:00:01
Collecting yarl<2.0,>=1.0
  Downloading yarl-1.8.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (262 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 262.1/262.1 kB 307.4 MB/s eta 0:00:00
Collecting async-timeout<5.0,>=4.0.0a3
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting multidict<7.0,>=4.5
  Downloading multidict-6.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (121 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.3/121.3 kB 275.5 MB/s eta 0:00:00
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.8/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard>=2.9.1->pytorch-lightning<1.8.0,>=1.7.4->autogluon.multimodal==0.6.2->autogluon) (0.4.8)
Collecting oauthlib>=3.0.0
  Downloading oauthlib-3.2.2-py3-none-any.whl (151 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 151.7/151.7 kB 306.9 MB/s eta 0:00:00
Building wheels for collected packages: fairscale, antlr4-python3-runtime, seqeval, future
  Building wheel for fairscale (pyproject.toml) ... done
  Created wheel for fairscale: filename=fairscale-0.4.6-py3-none-any.whl size=307224 sha256=1150205fdf93ac4671be1dcd864a94c69055e9f56f2e5ddc638e35293dbbeee9
  Stored in directory: /tmp/pip-ephem-wheel-cache-m9dqe9h4/wheels/60/e8/f1/4f2cc869823c35e834c6cee0552a0605c2bdc89f7da81f1a1d
  Building wheel for antlr4-python3-runtime (setup.py) ... done
  Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.8-py3-none-any.whl size=141211 sha256=d53fafbd0f048981ae67cdbe3d132d57709f4ea36d6e0bd9124fdf9d1eceda37
  Stored in directory: /tmp/pip-ephem-wheel-cache-m9dqe9h4/wheels/34/d7/fe/a833ceccaee881c6f8cd49985ee4285bf94c5cf2c66ea5db68
  Building wheel for seqeval (setup.py) ... done
  Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16164 sha256=5fb3a6f1ebf73bdc2f7679ae4b8ad4f630a56c56aafb9408ac63aae1b882e704
  Stored in directory: /tmp/pip-ephem-wheel-cache-m9dqe9h4/wheels/e3/30/9b/6b670dac34775f2b7cc4e9b172202e81fbb4f9cdb103c1ca66
  Building wheel for future (setup.py) ... done
  Created wheel for future: filename=future-0.18.3-py3-none-any.whl size=492025 sha256=77a91421a9d28e092163c947558bb81a1b74e973ebf164a9d4765a922858f5c0
  Stored in directory: /tmp/pip-ephem-wheel-cache-m9dqe9h4/wheels/a6/db/41/71a0e5d071a14e716cc11bb021a9caa8f76ec337eca071487e
Successfully built fairscale antlr4-python3-runtime seqeval future
Installing collected packages: typish, tokenizers, text-unidecode, tensorboard-plugin-wit, sortedcontainers, sentencepiece, py4j, msgpack, heapdict, distlib, cymem, antlr4-python3-runtime, zict, yacs, xxhash, wrapt, wasabi, typing-extensions, tqdm, toolz, tensorboard-data-server, tblib, spacy-loggers, spacy-legacy, smart-open, regex, pyrsistent, pyDeprecate, pyasn1-modules, platformdirs, Pillow, ordered-set, omegaconf, oauthlib, numpy, networkx, murmurhash, multidict, mdurl, locket, langcodes, importlib-resources, grpcio, future, frozenlist, filelock, fastprogress, defusedxml, click, catalogue, cachetools, autocfg, async-timeout, absl-py, yarl, virtualenv, typer, torch, tifffile, tensorboardX, srsly, responses, requests-oauthlib, PyWavelets, pydantic, preshed, patsy, partd, opencv-python-headless, nptyping, nltk, markdown-it-py, markdown, jsonschema, huggingface-hub, google-auth, fastcore, deprecated, blis, aiosignal, xgboost, transformers, torchvision, torchtext, torchmetrics, statsmodels, scikit-image, rich, ray, pathy, nlpaug, model-index, hyperopt, google-auth-oauthlib, gluonts, gluoncv, fastdownload, fairscale, dask, confection, catboost, aiohttp, accelerate, timm, thinc, tensorboard, sktime, seqeval, qudida, pytorch-metric-learning, pmdarima, openmim, lightgbm, distributed, tbats, spacy, pytorch-lightning, datasets, autogluon.common, albumentations, fastai, evaluate, autogluon.features, autogluon.core, autogluon.tabular, autogluon.multimodal, autogluon.vision, autogluon.timeseries, autogluon.text, autogluon
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.0.1
    Uninstalling typing_extensions-4.0.1:
      Successfully uninstalled typing_extensions-4.0.1
  Attempting uninstall: tqdm
    Found existing installation: tqdm 4.39.0
    Uninstalling tqdm-4.39.0:
      Successfully uninstalled tqdm-4.39.0
  Attempting uninstall: Pillow
    Found existing installation: Pillow 9.0.0
    Uninstalling Pillow-9.0.0:
      Successfully uninstalled Pillow-9.0.0
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.1
    Uninstalling numpy-1.19.1:
      Successfully uninstalled numpy-1.19.1
  Attempting uninstall: gluoncv
    Found existing installation: gluoncv 0.8.0
    Uninstalling gluoncv-0.8.0:
      Successfully uninstalled gluoncv-0.8.0
Successfully installed Pillow-9.4.0 PyWavelets-1.4.1 absl-py-1.4.0 accelerate-0.13.2 aiohttp-3.8.3 aiosignal-1.3.1 albumentations-1.1.0 antlr4-python3-runtime-4.8 async-timeout-4.0.2 autocfg-0.0.8 autogluon-0.6.2 autogluon.common-0.6.2 autogluon.core-0.6.2 autogluon.features-0.6.2 autogluon.multimodal-0.6.2 autogluon.tabular-0.6.2 autogluon.text-0.6.2 autogluon.timeseries-0.6.2 autogluon.vision-0.6.2 blis-0.7.9 cachetools-5.3.0 catalogue-2.0.8 catboost-1.1.1 click-8.0.4 confection-0.0.4 cymem-2.0.7 dask-2021.11.2 datasets-2.8.0 defusedxml-0.7.1 deprecated-1.2.13 distlib-0.3.6 distributed-2021.11.2 evaluate-0.3.0 fairscale-0.4.6 fastai-2.7.10 fastcore-1.5.27 fastdownload-0.0.7 fastprogress-1.0.3 filelock-3.9.0 frozenlist-1.3.3 future-0.18.3 gluoncv-0.10.5.post0 gluonts-0.11.8 google-auth-2.16.0 google-auth-oauthlib-0.4.6 grpcio-1.43.0 heapdict-1.0.1 huggingface-hub-0.11.1 hyperopt-0.2.7 importlib-resources-5.10.2 jsonschema-4.8.0 langcodes-3.3.0 lightgbm-3.3.4 locket-1.0.0 markdown-3.4.1 markdown-it-py-2.1.0 mdurl-0.1.2 model-index-0.1.11 msgpack-1.0.4 multidict-6.0.4 murmurhash-1.0.9 networkx-2.8.8 nlpaug-1.1.10 nltk-3.8.1 nptyping-1.4.4 numpy-1.21.6 oauthlib-3.2.2 omegaconf-2.1.2 opencv-python-headless-4.7.0.68 openmim-0.2.1 ordered-set-4.1.0 partd-1.3.0 pathy-0.10.1 patsy-0.5.3 platformdirs-2.6.2 pmdarima-1.8.5 preshed-3.0.8 py4j-0.10.9.7 pyDeprecate-0.3.2 pyasn1-modules-0.2.8 pydantic-1.10.4 pyrsistent-0.19.3 pytorch-lightning-1.7.7 pytorch-metric-learning-1.3.2 qudida-0.0.4 ray-2.0.1 regex-2022.10.31 requests-oauthlib-1.3.1 responses-0.18.0 rich-13.2.0 scikit-image-0.19.3 sentencepiece-0.1.97 seqeval-1.2.2 sktime-0.13.4 smart-open-5.2.1 sortedcontainers-2.4.0 spacy-3.5.0 spacy-legacy-3.0.12 spacy-loggers-1.0.4 srsly-2.4.5 statsmodels-0.13.5 tbats-1.1.2 tblib-1.7.0 tensorboard-2.11.2 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorboardX-2.5.1 text-unidecode-1.3 thinc-8.1.7 tifffile-2023.1.23.1 timm-0.6.12 tokenizers-0.13.2 toolz-0.12.0 torch-1.12.1 torchmetrics-0.8.2 torchtext-0.13.1 torchvision-0.13.1 tqdm-4.64.1 transformers-4.23.1 typer-0.7.0 typing-extensions-4.4.0 typish-1.9.3 virtualenv-20.17.1 wasabi-1.1.1 wrapt-1.14.1 xgboost-1.7.3 xxhash-3.2.0 yacs-0.1.8 yarl-1.8.2 zict-2.2.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
In [33]:
cd /root
/root
In [34]:
ls
AutogluonModels/             submission.csv
bike-sharing-demand.zip      submission_new_features.csv
cd0385-project-starter/      submission_new_features_2.csv
histogram_hours_feature.png  submission_new_hpo.csv
model_test_score.png         test.csv
model_train_score.png        train.csv
sampleSubmission.csv

Setup Kaggle API Key¶

In [35]:
!pip install -U kaggle
Collecting kaggle
  Using cached kaggle-1.5.12-py3-none-any.whl
Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.8/dist-packages (from kaggle) (1.16.0)
Collecting python-slugify
  Using cached python_slugify-7.0.0-py2.py3-none-any.whl (9.4 kB)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.8/dist-packages (from kaggle) (1.26.8)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from kaggle) (4.64.1)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.8/dist-packages (from kaggle) (2.8.2)
Requirement already satisfied: requests in /usr/local/lib/python3.8/dist-packages (from kaggle) (2.27.1)
Requirement already satisfied: certifi in /usr/local/lib/python3.8/dist-packages (from kaggle) (2021.10.8)
Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.8/dist-packages (from python-slugify->kaggle) (1.3)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.8/dist-packages (from requests->kaggle) (2.0.10)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests->kaggle) (3.3)
Installing collected packages: python-slugify, kaggle
Successfully installed kaggle-1.5.12 python-slugify-7.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
In [36]:
# create the .kaggle directory and an empty kaggle.json file
!mkdir -p /root/.kaggle
!touch /root/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json
In [4]:
# Fill in your user name and key from creating the kaggle account and API token file
import json
kaggle_username = "pghugare"
kaggle_key = "6add27806572c978578c6e3757086862"

# Save API token the kaggle.json file
with open("/root/.kaggle/kaggle.json", "w") as f:
    f.write(json.dumps({"username": kaggle_username, "key": kaggle_key}))

Download and explore dataset¶

Go to the bike sharing demand competition and agree to the terms¶

kaggle6.png

In [18]:
# Download the dataset, it will be in a .zip file so you'll need to unzip it as well.
!kaggle competitions download -c bike-sharing-demand
# If you already downloaded it you can use the -o command to overwrite the file
!unzip -o bike-sharing-demand.zip
Downloading bike-sharing-demand.zip to /root
  0%|                                                | 0.00/189k [00:00<?, ?B/s]
100%|████████████████████████████████████████| 189k/189k [00:00<00:00, 6.16MB/s]
Archive:  bike-sharing-demand.zip
  inflating: sampleSubmission.csv    
  inflating: test.csv                
  inflating: train.csv               
In [5]:
import pandas as pd
from autogluon.tabular import TabularPredictor
/usr/local/lib/python3.8/dist-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
In [8]:
# Create the train dataset in pandas by reading the csv
# Set the parsing of the datetime column so you can use some of the `dt` features in pandas later
train = pd.read_csv('train.csv', parse_dates=['datetime'])
train.head()
Out[8]:
datetime season holiday workingday weather temp atemp humidity windspeed casual registered count
0 2011-01-01 00:00:00 1 0 0 1 9.84 14.395 81 0.0 3 13 16
1 2011-01-01 01:00:00 1 0 0 1 9.02 13.635 80 0.0 8 32 40
2 2011-01-01 02:00:00 1 0 0 1 9.02 13.635 80 0.0 5 27 32
3 2011-01-01 03:00:00 1 0 0 1 9.84 14.395 75 0.0 3 10 13
4 2011-01-01 04:00:00 1 0 0 1 9.84 14.395 75 0.0 0 1 1
In [ ]:
# Simple output of the train dataset to view some of the min/max/varition of the dataset features.
In [9]:
# Create the test pandas dataframe in pandas by reading the csv, remember to parse the datetime!
test = pd.read_csv('test.csv', parse_dates=['datetime'])
test.head()
Out[9]:
datetime season holiday workingday weather temp atemp humidity windspeed
0 2011-01-20 00:00:00 1 0 1 1 10.66 11.365 56 26.0027
1 2011-01-20 01:00:00 1 0 1 1 10.66 13.635 56 0.0000
2 2011-01-20 02:00:00 1 0 1 1 10.66 13.635 56 0.0000
3 2011-01-20 03:00:00 1 0 1 1 10.66 12.880 56 11.0014
4 2011-01-20 04:00:00 1 0 1 1 10.66 12.880 56 11.0014
In [10]:
# Same thing as train and test dataset
submission = pd.read_csv('sampleSubmission.csv', parse_dates=['datetime'])
submission.head()
Out[10]:
datetime count
0 2011-01-20 00:00:00 0
1 2011-01-20 01:00:00 0
2 2011-01-20 02:00:00 0
3 2011-01-20 03:00:00 0
4 2011-01-20 04:00:00 0

Step 3: Train a model using AutoGluon’s Tabular Prediction¶

Requirements:

  • We are prediting count, so it is the label we are setting.
  • Ignore casual and registered columns as they are also not present in the test dataset.
  • Use the root_mean_squared_error as the metric to use for evaluation.
  • Set a time limit of 10 minutes (600 seconds).
  • Use the preset best_quality to focus on creating the best model.
In [11]:
columns_to_ignore = ["casual", "registered"]
for column_name in columns_to_ignore:
    train.drop(column_name, axis='columns', inplace=True)
train.head()

## performance metric (RMSE by default)

predictor = TabularPredictor(label="count").fit(train_data = train, time_limit=600, presets=['best_quality'])
Out[11]:
datetime season holiday workingday weather temp atemp humidity windspeed count
0 2011-01-01 00:00:00 1 0 0 1 9.84 14.395 81 0.0 16
1 2011-01-01 01:00:00 1 0 0 1 9.02 13.635 80 0.0 40
2 2011-01-01 02:00:00 1 0 0 1 9.02 13.635 80 0.0 32
3 2011-01-01 03:00:00 1 0 0 1 9.84 14.395 75 0.0 13
4 2011-01-01 04:00:00 1 0 0 1 9.84 14.395 75 0.0 1

Review AutoGluon's training run with ranking of models that did the best.¶

In [16]:
predictor.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
                     model   score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0      WeightedEnsemble_L3  -52.885174      11.174019  505.817829                0.000712           0.311318            3       True         15
1   RandomForestMSE_BAG_L2  -53.393270      10.297766  406.352121                0.606169          27.262657            2       True         12
2     ExtraTreesMSE_BAG_L2  -54.021078      10.297608  387.015604                0.606012           7.926141            2       True         14
3          LightGBM_BAG_L2  -55.101032       9.907717  400.566041                0.216120          21.476578            2       True         11
4          CatBoost_BAG_L2  -55.705478       9.745007  448.841135                0.053410          69.751672            2       True         13
5        LightGBMXT_BAG_L2  -60.705655      13.691409  432.604342                3.999813          53.514878            2       True         10
6    KNeighborsDist_BAG_L1  -84.125061       0.038620    0.029779                0.038620           0.029779            1       True          2
7      WeightedEnsemble_L2  -84.125061       0.039294    0.477109                0.000675           0.447330            2       True          9
8    KNeighborsUnif_BAG_L1 -101.546199       0.039820    0.032498                0.039820           0.032498            1       True          1
9   RandomForestMSE_BAG_L1 -116.544294       0.587877   10.327482                0.587877          10.327482            1       True          5
10    ExtraTreesMSE_BAG_L1 -124.588053       0.682145    4.782058                0.682145           4.782058            1       True          7
11         CatBoost_BAG_L1 -130.485847       0.124086  196.887096                0.124086         196.887096            1       True          6
12         LightGBM_BAG_L1 -131.054162       1.392111   26.093218                1.392111          26.093218            1       True          4
13       LightGBMXT_BAG_L1 -131.460909       6.493901   60.742787                6.493901          60.742787            1       True          3
14  NeuralNetFastAI_BAG_L1 -136.539545       0.333037   80.194546                0.333037          80.194546            1       True          8
Number of models trained: 15
Types of models trained:
{'StackerEnsembleModel_LGB', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_NNFastAiTabular', 'StackerEnsembleModel_XT', 'WeightedEnsembleModel'}
Bagging used: True  (with 8 folds)
Multi-layer stack-ensembling used: True  (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('float', [])                : 3 | ['temp', 'atemp', 'windspeed']
('int', [])                  : 3 | ['season', 'weather', 'humidity']
('int', ['bool'])            : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20230123_035239/SummaryOfModels.html
*** End of fit() summary ***
Out[16]:
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
  'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
  'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
  'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel',
  'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT',
  'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
 'model_performance': {'KNeighborsUnif_BAG_L1': -101.54619908446061,
  'KNeighborsDist_BAG_L1': -84.12506123181602,
  'LightGBMXT_BAG_L1': -131.46090891834504,
  'LightGBM_BAG_L1': -131.054161598899,
  'RandomForestMSE_BAG_L1': -116.54429428704391,
  'CatBoost_BAG_L1': -130.48584656124748,
  'ExtraTreesMSE_BAG_L1': -124.58805258915959,
  'NeuralNetFastAI_BAG_L1': -136.5395454996815,
  'WeightedEnsemble_L2': -84.12506123181602,
  'LightGBMXT_BAG_L2': -60.705655042275914,
  'LightGBM_BAG_L2': -55.10103226835344,
  'RandomForestMSE_BAG_L2': -53.39326979793196,
  'CatBoost_BAG_L2': -55.705477592793756,
  'ExtraTreesMSE_BAG_L2': -54.02107813705378,
  'WeightedEnsemble_L3': -52.88517418301905},
 'model_best': 'WeightedEnsemble_L3',
 'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/KNeighborsUnif_BAG_L1/',
  'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/KNeighborsDist_BAG_L1/',
  'LightGBMXT_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/LightGBMXT_BAG_L1/',
  'LightGBM_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/LightGBM_BAG_L1/',
  'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/RandomForestMSE_BAG_L1/',
  'CatBoost_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/CatBoost_BAG_L1/',
  'ExtraTreesMSE_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/ExtraTreesMSE_BAG_L1/',
  'NeuralNetFastAI_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/NeuralNetFastAI_BAG_L1/',
  'WeightedEnsemble_L2': 'AutogluonModels/ag-20230123_035239/models/WeightedEnsemble_L2/',
  'LightGBMXT_BAG_L2': 'AutogluonModels/ag-20230123_035239/models/LightGBMXT_BAG_L2/',
  'LightGBM_BAG_L2': 'AutogluonModels/ag-20230123_035239/models/LightGBM_BAG_L2/',
  'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20230123_035239/models/RandomForestMSE_BAG_L2/',
  'CatBoost_BAG_L2': 'AutogluonModels/ag-20230123_035239/models/CatBoost_BAG_L2/',
  'ExtraTreesMSE_BAG_L2': 'AutogluonModels/ag-20230123_035239/models/ExtraTreesMSE_BAG_L2/',
  'WeightedEnsemble_L3': 'AutogluonModels/ag-20230123_035239/models/WeightedEnsemble_L3/'},
 'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.03249764442443848,
  'KNeighborsDist_BAG_L1': 0.02977895736694336,
  'LightGBMXT_BAG_L1': 60.74278664588928,
  'LightGBM_BAG_L1': 26.093218088150024,
  'RandomForestMSE_BAG_L1': 10.327482461929321,
  'CatBoost_BAG_L1': 196.88709592819214,
  'ExtraTreesMSE_BAG_L1': 4.782058477401733,
  'NeuralNetFastAI_BAG_L1': 80.19454550743103,
  'WeightedEnsemble_L2': 0.4473297595977783,
  'LightGBMXT_BAG_L2': 53.514878034591675,
  'LightGBM_BAG_L2': 21.476577758789062,
  'RandomForestMSE_BAG_L2': 27.262657165527344,
  'CatBoost_BAG_L2': 69.75167155265808,
  'ExtraTreesMSE_BAG_L2': 7.926140785217285,
  'WeightedEnsemble_L3': 0.31131768226623535},
 'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.039820194244384766,
  'KNeighborsDist_BAG_L1': 0.0386197566986084,
  'LightGBMXT_BAG_L1': 6.493901491165161,
  'LightGBM_BAG_L1': 1.39211106300354,
  'RandomForestMSE_BAG_L1': 0.5878767967224121,
  'CatBoost_BAG_L1': 0.12408566474914551,
  'ExtraTreesMSE_BAG_L1': 0.6821451187133789,
  'NeuralNetFastAI_BAG_L1': 0.3330366611480713,
  'WeightedEnsemble_L2': 0.0006747245788574219,
  'LightGBMXT_BAG_L2': 3.999812602996826,
  'LightGBM_BAG_L2': 0.21611976623535156,
  'RandomForestMSE_BAG_L2': 0.6061689853668213,
  'CatBoost_BAG_L2': 0.05341005325317383,
  'ExtraTreesMSE_BAG_L2': 0.6060116291046143,
  'WeightedEnsemble_L3': 0.0007116794586181641},
 'num_bag_folds': 8,
 'max_stack_level': 3,
 'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'KNeighborsDist_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'LightGBMXT_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'NeuralNetFastAI_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L2': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'WeightedEnsemble_L3': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True}},
 'leaderboard':                      model   score_val  pred_time_val    fit_time  \
 0      WeightedEnsemble_L3  -52.885174      11.174019  505.817829   
 1   RandomForestMSE_BAG_L2  -53.393270      10.297766  406.352121   
 2     ExtraTreesMSE_BAG_L2  -54.021078      10.297608  387.015604   
 3          LightGBM_BAG_L2  -55.101032       9.907717  400.566041   
 4          CatBoost_BAG_L2  -55.705478       9.745007  448.841135   
 5        LightGBMXT_BAG_L2  -60.705655      13.691409  432.604342   
 6    KNeighborsDist_BAG_L1  -84.125061       0.038620    0.029779   
 7      WeightedEnsemble_L2  -84.125061       0.039294    0.477109   
 8    KNeighborsUnif_BAG_L1 -101.546199       0.039820    0.032498   
 9   RandomForestMSE_BAG_L1 -116.544294       0.587877   10.327482   
 10    ExtraTreesMSE_BAG_L1 -124.588053       0.682145    4.782058   
 11         CatBoost_BAG_L1 -130.485847       0.124086  196.887096   
 12         LightGBM_BAG_L1 -131.054162       1.392111   26.093218   
 13       LightGBMXT_BAG_L1 -131.460909       6.493901   60.742787   
 14  NeuralNetFastAI_BAG_L1 -136.539545       0.333037   80.194546   
 
     pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  \
 0                 0.000712           0.311318            3       True   
 1                 0.606169          27.262657            2       True   
 2                 0.606012           7.926141            2       True   
 3                 0.216120          21.476578            2       True   
 4                 0.053410          69.751672            2       True   
 5                 3.999813          53.514878            2       True   
 6                 0.038620           0.029779            1       True   
 7                 0.000675           0.447330            2       True   
 8                 0.039820           0.032498            1       True   
 9                 0.587877          10.327482            1       True   
 10                0.682145           4.782058            1       True   
 11                0.124086         196.887096            1       True   
 12                1.392111          26.093218            1       True   
 13                6.493901          60.742787            1       True   
 14                0.333037          80.194546            1       True   
 
     fit_order  
 0          15  
 1          12  
 2          14  
 3          11  
 4          13  
 5          10  
 6           2  
 7           9  
 8           1  
 9           5  
 10          7  
 11          6  
 12          4  
 13          3  
 14          8  }

Create predictions from test dataset¶

In [17]:
predictions = predictor.predict(test)
predictions.head()
Out[17]:
0    23.318344
1    42.508015
2    45.909454
3    48.781364
4    51.674591
Name: count, dtype: float32

NOTE: Kaggle will reject the submission if we don't set everything to be > 0.¶

In [18]:
# Describe the `predictions` series to see if there are any negative values
predictions.describe()
Out[18]:
count    6493.000000
mean      100.555389
std        90.140991
min         3.055882
25%        20.927584
50%        62.675346
75%       168.302856
max       364.284882
Name: count, dtype: float64
In [19]:
# How many negative values do we have?
(predictions < 0).sum()
Out[19]:
0
In [20]:
# Set them to zero
predictions[predictions < 0] = 0
In [21]:
predictions.describe()
Out[21]:
count    6493.000000
mean      100.555389
std        90.140991
min         3.055882
25%        20.927584
50%        62.675346
75%       168.302856
max       364.284882
Name: count, dtype: float64

Set predictions to submission dataframe, save, and submit¶

In [23]:
submission["count"] = predictions
submission.to_csv("submission.csv", index=False)
In [24]:
!kaggle competitions submit -c bike-sharing-demand -f submission.csv -m "first raw submission"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 357kB/s]
Successfully submitted to Bike Sharing Demand

View submission via the command line or in the web browser under the competition's page - My Submissions¶

In [25]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName        date                 description           status    publicScore  privateScore  
--------------  -------------------  --------------------  --------  -----------  ------------  
submission.csv  2023-01-23 04:09:07  first raw submission  complete  1.80760      1.80760       

Initial score of 1.80760¶

Step 4: Exploratory Data Analysis and Creating an additional feature¶

  • Any additional feature will do, but a great suggestion would be to separate out the datetime into hour, day, or month parts.
In [26]:
# Create a histogram of all features to show the distribution of each one relative to the data. This is part of the exploritory data analysis
train.hist(figsize=(15,15))
Out[26]:
array([[<AxesSubplot:title={'center':'datetime'}>,
        <AxesSubplot:title={'center':'season'}>,
        <AxesSubplot:title={'center':'holiday'}>],
       [<AxesSubplot:title={'center':'workingday'}>,
        <AxesSubplot:title={'center':'weather'}>,
        <AxesSubplot:title={'center':'temp'}>],
       [<AxesSubplot:title={'center':'atemp'}>,
        <AxesSubplot:title={'center':'humidity'}>,
        <AxesSubplot:title={'center':'windspeed'}>],
       [<AxesSubplot:title={'center':'count'}>, <AxesSubplot:>,
        <AxesSubplot:>]], dtype=object)
In [16]:
# create a new feature from datetime object

train['year'] = train.datetime.dt.year
train['month'] = train.datetime.dt.month
train['day'] = train.datetime.dt.day
train['hour'] = train.datetime.dt.hour
train['weekday'] = train.datetime.dt.weekday


test['year'] = test.datetime.dt.year
test['month'] = test.datetime.dt.month
test['day'] = test.datetime.dt.day
test['hour'] = test.datetime.dt.hour
test['weekday'] = test.datetime.dt.weekday

train = train.drop('datetime', axis = 1)
test = test.drop('datetime', axis = 1)
train.info()
test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 14 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   season      10886 non-null  int64  
 1   holiday     10886 non-null  int64  
 2   workingday  10886 non-null  int64  
 3   weather     10886 non-null  int64  
 4   temp        10886 non-null  float64
 5   atemp       10886 non-null  float64
 6   humidity    10886 non-null  int64  
 7   windspeed   10886 non-null  float64
 8   count       10886 non-null  int64  
 9   hour        10886 non-null  int64  
 10  year        10886 non-null  int64  
 11  month       10886 non-null  int64  
 12  day         10886 non-null  int64  
 13  weekday     10886 non-null  int64  
dtypes: float64(3), int64(11)
memory usage: 1.2 MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6493 entries, 0 to 6492
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   season      6493 non-null   int64  
 1   holiday     6493 non-null   int64  
 2   workingday  6493 non-null   int64  
 3   weather     6493 non-null   int64  
 4   temp        6493 non-null   float64
 5   atemp       6493 non-null   float64
 6   humidity    6493 non-null   int64  
 7   windspeed   6493 non-null   float64
 8   hour        6493 non-null   int64  
 9   year        6493 non-null   int64  
 10  month       6493 non-null   int64  
 11  day         6493 non-null   int64  
 12  weekday     6493 non-null   int64  
dtypes: float64(3), int64(10)
memory usage: 659.6 KB
In [17]:
train.corr()
Out[17]:
season holiday workingday weather temp atemp humidity windspeed count hour year month day weekday
season 1.000000 0.029368 -0.008126 0.008879 0.258689 0.264744 0.190610 -0.147121 0.163439 -0.006546 -0.004797 0.971524 0.001729 -0.010553
holiday 0.029368 1.000000 -0.250491 -0.007074 0.000295 -0.005215 0.001929 0.008409 -0.005393 -0.000354 0.012021 0.001731 -0.015877 -0.191832
workingday -0.008126 -0.250491 1.000000 0.033772 0.029966 0.024660 -0.010880 0.013373 0.011594 0.002780 -0.002482 -0.003394 0.009829 -0.704267
weather 0.008879 -0.007074 0.033772 1.000000 -0.055035 -0.055376 0.406244 0.007261 -0.128655 -0.022740 -0.012548 0.012144 -0.007890 -0.047692
temp 0.258689 0.000295 0.029966 -0.055035 1.000000 0.984948 -0.064949 -0.017852 0.394454 0.145430 0.061226 0.257589 0.015551 -0.038466
atemp 0.264744 -0.005215 0.024660 -0.055376 0.984948 1.000000 -0.043536 -0.057473 0.389784 0.140343 0.058540 0.264173 0.011866 -0.040235
humidity 0.190610 0.001929 -0.010880 0.406244 -0.064949 -0.043536 1.000000 -0.318607 -0.317371 -0.278011 -0.078606 0.204537 -0.011335 -0.026507
windspeed -0.147121 0.008409 0.013373 0.007261 -0.017852 -0.057473 -0.318607 1.000000 0.101369 0.146631 -0.015221 -0.150192 0.036157 -0.024804
count 0.163439 -0.005393 0.011594 -0.128655 0.394454 0.389784 -0.317371 0.101369 1.000000 0.400601 0.260403 0.166862 0.019826 -0.002283
hour -0.006546 -0.000354 0.002780 -0.022740 0.145430 0.140343 -0.278011 0.146631 0.400601 1.000000 -0.004234 -0.006818 0.001132 -0.002925
year -0.004797 0.012021 -0.002482 -0.012548 0.061226 0.058540 -0.078606 -0.015221 0.260403 -0.004234 1.000000 -0.004932 0.001800 -0.003785
month 0.971524 0.001731 -0.003394 0.012144 0.257589 0.264173 0.204537 -0.150192 0.166862 -0.006818 -0.004932 1.000000 0.001974 -0.002266
day 0.001729 -0.015877 0.009829 -0.007890 0.015551 0.011866 -0.011335 0.036157 0.019826 0.001132 0.001800 0.001974 1.000000 -0.011070
weekday -0.010553 -0.191832 -0.704267 -0.047692 -0.038466 -0.040235 -0.026507 -0.024804 -0.002283 -0.002925 -0.003785 -0.002266 -0.011070 1.000000
In [18]:
import seaborn as sns
sns.clustermap(train.corr())
Out[18]:
<seaborn.matrix.ClusterGrid at 0x7f9591f7cd00>
In [19]:
sns.pairplot(train)
Out[19]:
<seaborn.axisgrid.PairGrid at 0x7f9591f7c5b0>
In [20]:
import matplotlib.pyplot as plt


fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 8))
train.plot(ax=axes[0, 0], x="year", y="count", kind="scatter")
train.plot(ax=axes[0, 1], x="month", y="count", kind="scatter")
train.plot(ax=axes[1, 0], x="day", y="count", kind="scatter")
train.plot(ax=axes[1, 1], x="hour", y="count", kind="scatter")
train.plot(ax=axes[1, 1], x="weekday", y="count", kind="scatter")
Out[20]:
<AxesSubplot:xlabel='weekday', ylabel='count'>

Make category types for these so models know they are not just numbers¶

  • AutoGluon originally sees these as ints, but in reality they are int representations of a category.
  • Setting the dtype to category will classify these as categories in AutoGluon.
In [21]:
train["season"] = train.season.astype('category')
train["weather"] = train.weather.astype('category')
train["holiday"] = train.holiday.astype('category')
train["workingday"] = train.workingday.astype('category')

test["season"] = test.season.astype('category')
test["weather"] = test.weather.astype('category')
test["holiday"] = test.holiday.astype('category')
test["workingday"] = test.workingday.astype('category')
In [22]:
# View are new feature
train.head()
Out[22]:
season holiday workingday weather temp atemp humidity windspeed count hour year month day weekday
0 1 0 0 1 9.84 14.395 81 0.0 16 0 2011 1 1 5
1 1 0 0 1 9.02 13.635 80 0.0 40 1 2011 1 1 5
2 1 0 0 1 9.02 13.635 80 0.0 32 2 2011 1 1 5
3 1 0 0 1 9.84 14.395 75 0.0 13 3 2011 1 1 5
4 1 0 0 1 9.84 14.395 75 0.0 1 4 2011 1 1 5
In [23]:
# View histogram of all features again now with the hour feature
train.hist(figsize=(15,15))
Out[23]:
array([[<AxesSubplot:title={'center':'temp'}>,
        <AxesSubplot:title={'center':'atemp'}>,
        <AxesSubplot:title={'center':'humidity'}>],
       [<AxesSubplot:title={'center':'windspeed'}>,
        <AxesSubplot:title={'center':'count'}>,
        <AxesSubplot:title={'center':'hour'}>],
       [<AxesSubplot:title={'center':'year'}>,
        <AxesSubplot:title={'center':'month'}>,
        <AxesSubplot:title={'center':'day'}>],
       [<AxesSubplot:title={'center':'weekday'}>, <AxesSubplot:>,
        <AxesSubplot:>]], dtype=object)
In [31]:
## Histogram - Hours Feature 
ax = train['hour'].hist()
ax.set_xlabel('hour')
ax.set_ylabel('# samples')
ax.set_title('Histogram - Hours Feature')
fig = ax.get_figure()
fig.savefig('histogram_hours_feature.png')

Step 5: Rerun the model with the same settings as before, just with more features¶

In [24]:
# verify columns and create train with new features
train_new_features = train[train.columns.to_list()]
train_new_features.head()
Out[24]:
season holiday workingday weather temp atemp humidity windspeed count hour year month day weekday
0 1 0 0 1 9.84 14.395 81 0.0 16 0 2011 1 1 5
1 1 0 0 1 9.02 13.635 80 0.0 40 1 2011 1 1 5
2 1 0 0 1 9.02 13.635 80 0.0 32 2 2011 1 1 5
3 1 0 0 1 9.84 14.395 75 0.0 13 3 2011 1 1 5
4 1 0 0 1 9.84 14.395 75 0.0 1 4 2011 1 1 5
In [25]:
predictor_new_features = TabularPredictor(label="count").fit(train_data = train_new_features, time_limit=600, presets=['best_quality'])
No path specified. Models will be saved in: "AutogluonModels/ag-20230124_040306/"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20230124_040306/"
AutoGluon Version:  0.6.2
Python Version:     3.8.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Fri Dec 9 09:57:03 UTC 2022
Train Data Rows:    10886
Train Data Columns: 13
Label Column: count
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
	Label info (max, min, mean, stddev): (977, 1, 191.57413, 181.14445)
	If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    2470.59 MB
	Train Data (Original)  Memory Usage: 0.83 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 3 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('category', []) : 4 | ['season', 'holiday', 'workingday', 'weather']
		('float', [])    : 3 | ['temp', 'atemp', 'windspeed']
		('int', [])      : 6 | ['humidity', 'hour', 'year', 'month', 'day', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 2 | ['season', 'weather']
		('float', [])     : 3 | ['temp', 'atemp', 'windspeed']
		('int', [])       : 5 | ['humidity', 'hour', 'month', 'day', 'weekday']
		('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
	0.1s = Fit runtime
	13 features in original data used to generate 13 features in processed data.
	Train Data (Processed) Memory Usage: 0.75 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.12s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
	This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
	To change this, specify the eval_metric parameter of Predictor()
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.82s of the 599.87s of remaining time.
	-119.9788	 = Validation score   (-root_mean_squared_error)
	0.03s	 = Training   runtime
	0.16s	 = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 396.22s of the 596.27s of remaining time.
	-115.0385	 = Validation score   (-root_mean_squared_error)
	0.03s	 = Training   runtime
	0.19s	 = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 395.89s of the 595.94s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-34.7488	 = Validation score   (-root_mean_squared_error)
	81.6s	 = Training   runtime
	12.89s	 = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 306.22s of the 506.27s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-34.4395	 = Validation score   (-root_mean_squared_error)
	41.29s	 = Training   runtime
	4.74s	 = Validation runtime
Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 260.7s of the 460.76s of remaining time.
	-38.9875	 = Validation score   (-root_mean_squared_error)
	9.32s	 = Training   runtime
	0.55s	 = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 248.46s of the 448.52s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-34.8825	 = Validation score   (-root_mean_squared_error)
	209.71s	 = Training   runtime
	0.18s	 = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 35.55s of the 235.61s of remaining time.
	-38.9384	 = Validation score   (-root_mean_squared_error)
	5.02s	 = Training   runtime
	0.55s	 = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 27.57s of the 227.62s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-88.6188	 = Validation score   (-root_mean_squared_error)
	42.78s	 = Training   runtime
	0.39s	 = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 180.21s of remaining time.
	-32.7532	 = Validation score   (-root_mean_squared_error)
	0.61s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting 9 L2 models ...
Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 179.53s of the 179.51s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-33.5966	 = Validation score   (-root_mean_squared_error)
	20.63s	 = Training   runtime
	0.34s	 = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 155.34s of the 155.33s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-32.8836	 = Validation score   (-root_mean_squared_error)
	19.25s	 = Training   runtime
	0.13s	 = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 132.52s of the 132.5s of remaining time.
	-33.1773	 = Validation score   (-root_mean_squared_error)
	26.99s	 = Training   runtime
	0.61s	 = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 101.89s of the 101.87s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-32.8664	 = Validation score   (-root_mean_squared_error)
	58.81s	 = Training   runtime
	0.11s	 = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 40.01s of the 39.99s of remaining time.
	-32.5551	 = Validation score   (-root_mean_squared_error)
	7.97s	 = Training   runtime
	0.6s	 = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L2 ... Training model for up to 29.05s of the 29.03s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-33.3914	 = Validation score   (-root_mean_squared_error)
	44.37s	 = Training   runtime
	0.54s	 = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the -18.66s of remaining time.
	-32.3883	 = Validation score   (-root_mean_squared_error)
	0.35s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 619.19s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20230124_040306/")
In [26]:
predictor_new_features.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
                     model   score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0      WeightedEnsemble_L3  -32.388291      21.638019  547.511710                0.000679           0.350135            3       True         16
1     ExtraTreesMSE_BAG_L2  -32.555143      20.238662  397.744583                0.601037           7.967542            2       True         14
2      WeightedEnsemble_L2  -32.753155      18.897724  347.555751                0.001064           0.611749            2       True          9
3          CatBoost_BAG_L2  -32.866351      19.748722  448.583476                0.111097          58.806435            2       True         13
4          LightGBM_BAG_L2  -32.883595      19.767843  409.027248                0.130219          19.250206            2       True         11
5   RandomForestMSE_BAG_L2  -33.177270      20.250873  416.766898                0.613249          26.989857            2       True         12
6   NeuralNetFastAI_BAG_L2  -33.391432      20.181738  434.147535                0.544113          44.370494            2       True         15
7        LightGBMXT_BAG_L2  -33.596635      19.975402  410.410778                0.337778          20.633736            2       True         10
8          LightGBM_BAG_L1  -34.439520       4.735687   41.291645                4.735687          41.291645            1       True          4
9        LightGBMXT_BAG_L1  -34.748807      12.885391   81.596144               12.885391          81.596144            1       True          3
10         CatBoost_BAG_L1  -34.882533       0.179237  209.710797                0.179237         209.710797            1       True          6
11    ExtraTreesMSE_BAG_L1  -38.938443       0.549690    5.022911                0.549690           5.022911            1       True          7
12  RandomForestMSE_BAG_L1  -38.987462       0.546655    9.322506                0.546655           9.322506            1       True          5
13  NeuralNetFastAI_BAG_L1  -88.618850       0.385614   42.777728                0.385614          42.777728            1       True          8
14   KNeighborsDist_BAG_L1 -115.038459       0.191050    0.027272                0.191050           0.027272            1       True          2
15   KNeighborsUnif_BAG_L1 -119.978810       0.164299    0.028038                0.164299           0.028038            1       True          1
Number of models trained: 16
Types of models trained:
{'StackerEnsembleModel_NNFastAiTabular', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_RF', 'WeightedEnsembleModel', 'StackerEnsembleModel_XT'}
Bagging used: True  (with 8 folds)
Multi-layer stack-ensembling used: True  (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', [])  : 2 | ['season', 'weather']
('float', [])     : 3 | ['temp', 'atemp', 'windspeed']
('int', [])       : 5 | ['humidity', 'hour', 'month', 'day', 'weekday']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
Plot summary of models saved to file: AutogluonModels/ag-20230124_040306/SummaryOfModels.html
*** End of fit() summary ***
Out[26]:
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
  'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
  'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
  'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel',
  'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT',
  'NeuralNetFastAI_BAG_L2': 'StackerEnsembleModel_NNFastAiTabular',
  'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
 'model_performance': {'KNeighborsUnif_BAG_L1': -119.97880966975461,
  'KNeighborsDist_BAG_L1': -115.038459148802,
  'LightGBMXT_BAG_L1': -34.748807067972244,
  'LightGBM_BAG_L1': -34.43952035710387,
  'RandomForestMSE_BAG_L1': -38.987461831485355,
  'CatBoost_BAG_L1': -34.88253330930523,
  'ExtraTreesMSE_BAG_L1': -38.9384425957686,
  'NeuralNetFastAI_BAG_L1': -88.61884952457058,
  'WeightedEnsemble_L2': -32.75315538627129,
  'LightGBMXT_BAG_L2': -33.59663520640235,
  'LightGBM_BAG_L2': -32.88359450763695,
  'RandomForestMSE_BAG_L2': -33.17726957645052,
  'CatBoost_BAG_L2': -32.86635106455899,
  'ExtraTreesMSE_BAG_L2': -32.555142712792,
  'NeuralNetFastAI_BAG_L2': -33.39143201082359,
  'WeightedEnsemble_L3': -32.38829124232758},
 'model_best': 'WeightedEnsemble_L3',
 'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/KNeighborsUnif_BAG_L1/',
  'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/KNeighborsDist_BAG_L1/',
  'LightGBMXT_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/LightGBMXT_BAG_L1/',
  'LightGBM_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/LightGBM_BAG_L1/',
  'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/RandomForestMSE_BAG_L1/',
  'CatBoost_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/CatBoost_BAG_L1/',
  'ExtraTreesMSE_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/ExtraTreesMSE_BAG_L1/',
  'NeuralNetFastAI_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/NeuralNetFastAI_BAG_L1/',
  'WeightedEnsemble_L2': 'AutogluonModels/ag-20230124_040306/models/WeightedEnsemble_L2/',
  'LightGBMXT_BAG_L2': 'AutogluonModels/ag-20230124_040306/models/LightGBMXT_BAG_L2/',
  'LightGBM_BAG_L2': 'AutogluonModels/ag-20230124_040306/models/LightGBM_BAG_L2/',
  'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20230124_040306/models/RandomForestMSE_BAG_L2/',
  'CatBoost_BAG_L2': 'AutogluonModels/ag-20230124_040306/models/CatBoost_BAG_L2/',
  'ExtraTreesMSE_BAG_L2': 'AutogluonModels/ag-20230124_040306/models/ExtraTreesMSE_BAG_L2/',
  'NeuralNetFastAI_BAG_L2': 'AutogluonModels/ag-20230124_040306/models/NeuralNetFastAI_BAG_L2/',
  'WeightedEnsemble_L3': 'AutogluonModels/ag-20230124_040306/models/WeightedEnsemble_L3/'},
 'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.02803778648376465,
  'KNeighborsDist_BAG_L1': 0.027272462844848633,
  'LightGBMXT_BAG_L1': 81.59614372253418,
  'LightGBM_BAG_L1': 41.29164505004883,
  'RandomForestMSE_BAG_L1': 9.322505950927734,
  'CatBoost_BAG_L1': 209.7107973098755,
  'ExtraTreesMSE_BAG_L1': 5.0229105949401855,
  'NeuralNetFastAI_BAG_L1': 42.77772831916809,
  'WeightedEnsemble_L2': 0.6117486953735352,
  'LightGBMXT_BAG_L2': 20.63373637199402,
  'LightGBM_BAG_L2': 19.250206470489502,
  'RandomForestMSE_BAG_L2': 26.989856958389282,
  'CatBoost_BAG_L2': 58.806434631347656,
  'ExtraTreesMSE_BAG_L2': 7.967541933059692,
  'NeuralNetFastAI_BAG_L2': 44.37049412727356,
  'WeightedEnsemble_L3': 0.35013484954833984},
 'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.16429948806762695,
  'KNeighborsDist_BAG_L1': 0.19105029106140137,
  'LightGBMXT_BAG_L1': 12.885390520095825,
  'LightGBM_BAG_L1': 4.735687017440796,
  'RandomForestMSE_BAG_L1': 0.5466549396514893,
  'CatBoost_BAG_L1': 0.17923736572265625,
  'ExtraTreesMSE_BAG_L1': 0.5496904850006104,
  'NeuralNetFastAI_BAG_L1': 0.38561439514160156,
  'WeightedEnsemble_L2': 0.0010638236999511719,
  'LightGBMXT_BAG_L2': 0.33777761459350586,
  'LightGBM_BAG_L2': 0.1302187442779541,
  'RandomForestMSE_BAG_L2': 0.6132485866546631,
  'CatBoost_BAG_L2': 0.11109709739685059,
  'ExtraTreesMSE_BAG_L2': 0.6010372638702393,
  'NeuralNetFastAI_BAG_L2': 0.5441131591796875,
  'WeightedEnsemble_L3': 0.0006792545318603516},
 'num_bag_folds': 8,
 'max_stack_level': 3,
 'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'KNeighborsDist_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'LightGBMXT_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'NeuralNetFastAI_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L2': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'NeuralNetFastAI_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L3': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True}},
 'leaderboard':                      model   score_val  pred_time_val    fit_time  \
 0      WeightedEnsemble_L3  -32.388291      21.638019  547.511710   
 1     ExtraTreesMSE_BAG_L2  -32.555143      20.238662  397.744583   
 2      WeightedEnsemble_L2  -32.753155      18.897724  347.555751   
 3          CatBoost_BAG_L2  -32.866351      19.748722  448.583476   
 4          LightGBM_BAG_L2  -32.883595      19.767843  409.027248   
 5   RandomForestMSE_BAG_L2  -33.177270      20.250873  416.766898   
 6   NeuralNetFastAI_BAG_L2  -33.391432      20.181738  434.147535   
 7        LightGBMXT_BAG_L2  -33.596635      19.975402  410.410778   
 8          LightGBM_BAG_L1  -34.439520       4.735687   41.291645   
 9        LightGBMXT_BAG_L1  -34.748807      12.885391   81.596144   
 10         CatBoost_BAG_L1  -34.882533       0.179237  209.710797   
 11    ExtraTreesMSE_BAG_L1  -38.938443       0.549690    5.022911   
 12  RandomForestMSE_BAG_L1  -38.987462       0.546655    9.322506   
 13  NeuralNetFastAI_BAG_L1  -88.618850       0.385614   42.777728   
 14   KNeighborsDist_BAG_L1 -115.038459       0.191050    0.027272   
 15   KNeighborsUnif_BAG_L1 -119.978810       0.164299    0.028038   
 
     pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  \
 0                 0.000679           0.350135            3       True   
 1                 0.601037           7.967542            2       True   
 2                 0.001064           0.611749            2       True   
 3                 0.111097          58.806435            2       True   
 4                 0.130219          19.250206            2       True   
 5                 0.613249          26.989857            2       True   
 6                 0.544113          44.370494            2       True   
 7                 0.337778          20.633736            2       True   
 8                 4.735687          41.291645            1       True   
 9                12.885391          81.596144            1       True   
 10                0.179237         209.710797            1       True   
 11                0.549690           5.022911            1       True   
 12                0.546655           9.322506            1       True   
 13                0.385614          42.777728            1       True   
 14                0.191050           0.027272            1       True   
 15                0.164299           0.028038            1       True   
 
     fit_order  
 0          16  
 1          14  
 2           9  
 3          13  
 4          11  
 5          12  
 6          15  
 7          10  
 8           4  
 9           3  
 10          6  
 11          7  
 12          5  
 13          8  
 14          2  
 15          1  }
In [27]:
predictor_new_features.leaderboard(silent=True).plot(kind="bar", x="model", y="score_val")
Out[27]:
<AxesSubplot:xlabel='model'>
In [29]:
# Remember to set all negative values to zero

predictions_new_features = predictor_new_features.predict(test)
print((predictions_new_features < 0).sum())
predictions_new_features[predictions_new_features<0] = 0
predictions_new_features.describe()
0
Out[29]:
count    6493.000000
mean      189.825287
std       173.529587
min         2.279106
25%        46.497772
50%       147.084320
75%       277.898193
max       905.759033
Name: count, dtype: float64
In [40]:
test["count"] = 0
performance_new_features_2 = predictor_new_features.evaluate(test)
print("The performance indicators are : \n", performance_new_features_2)
/usr/local/lib/python3.8/dist-packages/scipy/stats/stats.py:4023: PearsonRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  warnings.warn(PearsonRConstantInputWarning())
Evaluation: root_mean_squared_error on test data: -257.1799339333484
	Note: Scores are always higher_is_better. This metric score can be multiplied by -1 to get the metric value.
Evaluations on test data:
{
    "root_mean_squared_error": -257.1799339333484,
    "mean_squared_error": -66141.51841796147,
    "mean_absolute_error": -189.82528704993257,
    "r2": 0.0,
    "pearsonr": NaN,
    "median_absolute_error": -147.08432006835938
}
The performance indicators are : 
 {'root_mean_squared_error': -257.1799339333484, 'mean_squared_error': -66141.51841796147, 'mean_absolute_error': -189.82528704993257, 'r2': 0.0, 'pearsonr': nan, 'median_absolute_error': -147.08432006835938}
In [31]:
# Same submitting predictions
submission_new_features = pd.read_csv('submission.csv')
submission_new_features["count"] = predictions_new_features
submission_new_features.to_csv("submission_new_features_2.csv", index=False)
In [37]:
!kaggle competitions submit -c bike-sharing-demand -f submission_new_features.csv -m "new features 2"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 310kB/s]
Successfully submitted to Bike Sharing Demand
In [38]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName                     date                 description                        status    publicScore  privateScore  
---------------------------  -------------------  ---------------------------------  --------  -----------  ------------  
submission_new_features.csv  2023-01-24 04:22:35  new features 2                     complete  0.69366      0.69366       
submission_new_hpo.csv       2023-01-23 15:19:10  new features with hyperparameters  complete  1.31738      1.31738       
submission_new_features.csv  2023-01-23 05:13:22  new features                       complete  0.69366      0.69366       
submission.csv               2023-01-23 04:09:07  first raw submission               complete  1.80760      1.80760       

New Score of 0.69366¶

Step 6: Hyper parameter optimization¶

  • There are many options for hyper parameter optimization.
  • Options are to change the AutoGluon higher level parameters or the individual model hyperparameters.
  • The hyperparameters of the models themselves that are in AutoGluon. Those need the hyperparameter and hyperparameter_tune_kwargs arguments.
In [44]:
train_new_hpo = train[train.columns.to_list()]
train_new_hpo.head()
Out[44]:
season holiday workingday weather temp atemp humidity windspeed count hour year month day weekday
0 1 0 0 1 9.84 14.395 81 0.0 16 0 2011 1 1 5
1 1 0 0 1 9.02 13.635 80 0.0 40 1 2011 1 1 5
2 1 0 0 1 9.02 13.635 80 0.0 32 2 2011 1 1 5
3 1 0 0 1 9.84 14.395 75 0.0 13 3 2011 1 1 5
4 1 0 0 1 9.84 14.395 75 0.0 1 4 2011 1 1 5
In [55]:
#https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-indepth.html#specifying-hyperparameters-and-tuning-them

import autogluon.core as ag

nn_options = {  # specifies non-default hyperparameter values for neural network models
    'num_epochs': 10,  # number of training epochs 
    'learning_rate': ag.space.Real(1e-4, 1e-2, default=5e-4, log=True),  # learning rate used in training (real-valued hyperparameter searched on log-scale)
    'activation': ag.space.Categorical('relu', 'softrelu', 'tanh'),  # activation function used in NN (categorical hyperparameter, default = first entry)
    'layers': ag.space.Categorical([100], [1000], [200, 100], [300, 200, 100]),  # each choice for categorical hyperparameter 'layers' corresponds to list of sizes for each NN layer to use
    'dropout_prob': ag.space.Real(0.0, 0.5, default=0.1),  # dropout probability (real-valued hyperparameter)
}

gbm_options = {  # specifies non-default hyperparameter values for lightGBM gradient boosted trees
    'num_boost_round': 100,  # number of boosting rounds (controls training time of GBM models)
    'num_leaves': ag.space.Int(lower=26, upper=66, default=36)  # number of leaves in trees (integer hyperparameter)
}

hyperparameters = {  
    # hyperparameters of each model type
    'GBM': gbm_options,
    'NN': nn_options }  

search_strategy = 'auto'

hyperparameter_tune_kwargs = {  
    # HPO is not performed unless hyperparameter_tune_kwargs is specified
    'scheduler' : 'local', # local scheduler
    'searcher': search_strategy
}

predictor_new_hpo = TabularPredictor(label="count").fit(train_data=train_new_hpo, time_limit=600, presets="best_quality", hyperparameters=hyperparameters, hyperparameter_tune_kwargs=hyperparameter_tune_kwargs)
In [19]:
predictor_new_hpo.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
                  model   score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0   WeightedEnsemble_L3 -132.758746       0.002508  237.809628                0.001135           1.007823            3       True         16
1    LightGBM_BAG_L2/T1 -132.910994       0.001118  196.132830                0.000139          19.016073            2       True         11
2    LightGBM_BAG_L2/T2 -132.971214       0.001108  197.085495                0.000129          19.968738            2       True         12
3   WeightedEnsemble_L2 -133.074144       0.001216   38.909853                0.000999           0.441841            2       True         10
4    LightGBM_BAG_L2/T5 -133.191017       0.001094  197.683682                0.000115          20.566925            2       True         15
5    LightGBM_BAG_L1/T8 -133.366596       0.000085   19.524052                0.000085          19.524052            1       True          8
6    LightGBM_BAG_L2/T3 -133.454683       0.001105  197.816994                0.000126          20.700237            2       True         13
7    LightGBM_BAG_L1/T7 -134.190162       0.000132   18.943960                0.000132          18.943960            1       True          7
8    LightGBM_BAG_L1/T3 -134.194130       0.000125   19.437752                0.000125          19.437752            1       True          3
9    LightGBM_BAG_L1/T2 -135.029528       0.000127   18.838973                0.000127          18.838973            1       True          2
10   LightGBM_BAG_L1/T1 -135.473207       0.000152   23.195276                0.000152          23.195276            1       True          1
11   LightGBM_BAG_L1/T5 -135.746640       0.000085   19.472352                0.000085          19.472352            1       True          5
12   LightGBM_BAG_L2/T4 -148.765464       0.001064  196.921249                0.000085          19.804492            2       True         14
13   LightGBM_BAG_L1/T9 -152.443903       0.000086   19.152268                0.000086          19.152268            1       True          9
14   LightGBM_BAG_L1/T6 -153.737220       0.000082   19.267452                0.000082          19.267452            1       True          6
15   LightGBM_BAG_L1/T4 -156.019224       0.000105   19.284673                0.000105          19.284673            1       True          4
Number of models trained: 16
Types of models trained:
{'WeightedEnsembleModel', 'StackerEnsembleModel_LGB'}
Bagging used: True  (with 8 folds)
Multi-layer stack-ensembling used: True  (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('float', [])                : 3 | ['temp', 'atemp', 'windspeed']
('int', [])                  : 3 | ['season', 'weather', 'humidity']
('int', ['bool'])            : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20230123_150537/SummaryOfModels.html
*** End of fit() summary ***
Out[19]:
{'model_types': {'LightGBM_BAG_L1/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T2': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T3': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T4': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T5': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T6': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T7': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T8': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T9': 'StackerEnsembleModel_LGB',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel',
  'LightGBM_BAG_L2/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2/T2': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2/T3': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2/T4': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2/T5': 'StackerEnsembleModel_LGB',
  'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
 'model_performance': {'LightGBM_BAG_L1/T1': -135.4732072756916,
  'LightGBM_BAG_L1/T2': -135.02952795945737,
  'LightGBM_BAG_L1/T3': -134.19413006667938,
  'LightGBM_BAG_L1/T4': -156.01922351304077,
  'LightGBM_BAG_L1/T5': -135.74663980949632,
  'LightGBM_BAG_L1/T6': -153.73722037655872,
  'LightGBM_BAG_L1/T7': -134.19016227420124,
  'LightGBM_BAG_L1/T8': -133.36659565430654,
  'LightGBM_BAG_L1/T9': -152.443903346809,
  'WeightedEnsemble_L2': -133.07414427738493,
  'LightGBM_BAG_L2/T1': -132.91099439270775,
  'LightGBM_BAG_L2/T2': -132.97121381309128,
  'LightGBM_BAG_L2/T3': -133.45468287620238,
  'LightGBM_BAG_L2/T4': -148.76546366752737,
  'LightGBM_BAG_L2/T5': -133.19101680216852,
  'WeightedEnsemble_L3': -132.75874630547804},
 'model_best': 'WeightedEnsemble_L3',
 'model_paths': {'LightGBM_BAG_L1/T1': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T1/',
  'LightGBM_BAG_L1/T2': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T2/',
  'LightGBM_BAG_L1/T3': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T3/',
  'LightGBM_BAG_L1/T4': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T4/',
  'LightGBM_BAG_L1/T5': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T5/',
  'LightGBM_BAG_L1/T6': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T6/',
  'LightGBM_BAG_L1/T7': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T7/',
  'LightGBM_BAG_L1/T8': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T8/',
  'LightGBM_BAG_L1/T9': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T9/',
  'WeightedEnsemble_L2': 'AutogluonModels/ag-20230123_150537/models/WeightedEnsemble_L2/',
  'LightGBM_BAG_L2/T1': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L2/T1/',
  'LightGBM_BAG_L2/T2': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L2/T2/',
  'LightGBM_BAG_L2/T3': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L2/T3/',
  'LightGBM_BAG_L2/T4': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L2/T4/',
  'LightGBM_BAG_L2/T5': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L2/T5/',
  'WeightedEnsemble_L3': 'AutogluonModels/ag-20230123_150537/models/WeightedEnsemble_L3/'},
 'model_fit_times': {'LightGBM_BAG_L1/T1': 23.195276021957397,
  'LightGBM_BAG_L1/T2': 18.8389732837677,
  'LightGBM_BAG_L1/T3': 19.437751531600952,
  'LightGBM_BAG_L1/T4': 19.28467321395874,
  'LightGBM_BAG_L1/T5': 19.472351551055908,
  'LightGBM_BAG_L1/T6': 19.267452001571655,
  'LightGBM_BAG_L1/T7': 18.943959951400757,
  'LightGBM_BAG_L1/T8': 19.524051666259766,
  'LightGBM_BAG_L1/T9': 19.152267694473267,
  'WeightedEnsemble_L2': 0.44184088706970215,
  'LightGBM_BAG_L2/T1': 19.01607322692871,
  'LightGBM_BAG_L2/T2': 19.968738079071045,
  'LightGBM_BAG_L2/T3': 20.700237035751343,
  'LightGBM_BAG_L2/T4': 19.804491758346558,
  'LightGBM_BAG_L2/T5': 20.566925287246704,
  'WeightedEnsemble_L3': 1.0078227519989014},
 'model_pred_times': {'LightGBM_BAG_L1/T1': 0.00015163421630859375,
  'LightGBM_BAG_L1/T2': 0.0001266002655029297,
  'LightGBM_BAG_L1/T3': 0.0001251697540283203,
  'LightGBM_BAG_L1/T4': 0.00010514259338378906,
  'LightGBM_BAG_L1/T5': 8.463859558105469e-05,
  'LightGBM_BAG_L1/T6': 8.249282836914062e-05,
  'LightGBM_BAG_L1/T7': 0.0001323223114013672,
  'LightGBM_BAG_L1/T8': 8.487701416015625e-05,
  'LightGBM_BAG_L1/T9': 8.58306884765625e-05,
  'WeightedEnsemble_L2': 0.0009992122650146484,
  'LightGBM_BAG_L2/T1': 0.00013899803161621094,
  'LightGBM_BAG_L2/T2': 0.0001289844512939453,
  'LightGBM_BAG_L2/T3': 0.00012636184692382812,
  'LightGBM_BAG_L2/T4': 8.535385131835938e-05,
  'LightGBM_BAG_L2/T5': 0.00011491775512695312,
  'WeightedEnsemble_L3': 0.0011348724365234375},
 'num_bag_folds': 8,
 'max_stack_level': 3,
 'model_hyperparams': {'LightGBM_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T3': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T4': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T5': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T6': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T7': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T8': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T9': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L2': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2/T3': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2/T4': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2/T5': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L3': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True}},
 'leaderboard':                   model   score_val  pred_time_val    fit_time  \
 0   WeightedEnsemble_L3 -132.758746       0.002508  237.809628   
 1    LightGBM_BAG_L2/T1 -132.910994       0.001118  196.132830   
 2    LightGBM_BAG_L2/T2 -132.971214       0.001108  197.085495   
 3   WeightedEnsemble_L2 -133.074144       0.001216   38.909853   
 4    LightGBM_BAG_L2/T5 -133.191017       0.001094  197.683682   
 5    LightGBM_BAG_L1/T8 -133.366596       0.000085   19.524052   
 6    LightGBM_BAG_L2/T3 -133.454683       0.001105  197.816994   
 7    LightGBM_BAG_L1/T7 -134.190162       0.000132   18.943960   
 8    LightGBM_BAG_L1/T3 -134.194130       0.000125   19.437752   
 9    LightGBM_BAG_L1/T2 -135.029528       0.000127   18.838973   
 10   LightGBM_BAG_L1/T1 -135.473207       0.000152   23.195276   
 11   LightGBM_BAG_L1/T5 -135.746640       0.000085   19.472352   
 12   LightGBM_BAG_L2/T4 -148.765464       0.001064  196.921249   
 13   LightGBM_BAG_L1/T9 -152.443903       0.000086   19.152268   
 14   LightGBM_BAG_L1/T6 -153.737220       0.000082   19.267452   
 15   LightGBM_BAG_L1/T4 -156.019224       0.000105   19.284673   
 
     pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  \
 0                 0.001135           1.007823            3       True   
 1                 0.000139          19.016073            2       True   
 2                 0.000129          19.968738            2       True   
 3                 0.000999           0.441841            2       True   
 4                 0.000115          20.566925            2       True   
 5                 0.000085          19.524052            1       True   
 6                 0.000126          20.700237            2       True   
 7                 0.000132          18.943960            1       True   
 8                 0.000125          19.437752            1       True   
 9                 0.000127          18.838973            1       True   
 10                0.000152          23.195276            1       True   
 11                0.000085          19.472352            1       True   
 12                0.000085          19.804492            2       True   
 13                0.000086          19.152268            1       True   
 14                0.000082          19.267452            1       True   
 15                0.000105          19.284673            1       True   
 
     fit_order  
 0          16  
 1          11  
 2          12  
 3          10  
 4          15  
 5           8  
 6          13  
 7           7  
 8           3  
 9           2  
 10          1  
 11          5  
 12         14  
 13          9  
 14          6  
 15          4  }
In [20]:
# Remember to set all negative values to zero
predictions_new_hpo = predictor_new_hpo.predict(test)
print((predictions_new_hpo < 0).sum())
predictions_new_hpo[predictions_new_hpo<0] = 0
predictions_new_hpo.describe()
0
Out[20]:
count    6493.000000
mean      195.552979
std       118.013786
min        44.590271
25%       108.154114
50%       166.215988
75%       265.650452
max       586.378479
Name: count, dtype: float64
In [21]:
# Same submitting predictions
submission_new_hpo = pd.read_csv('submission.csv')
submission_new_hpo["count"] = predictions_new_hpo
submission_new_hpo.to_csv("submission_new_hpo.csv", index=False)
In [22]:
!kaggle competitions submit -c bike-sharing-demand -f submission_new_hpo.csv -m "new features with hyperparameters"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 303kB/s]
Successfully submitted to Bike Sharing Demand
In [23]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName                     date                 description                        status    publicScore  privateScore  
---------------------------  -------------------  ---------------------------------  --------  -----------  ------------  
submission_new_hpo.csv       2023-01-23 15:19:10  new features with hyperparameters  complete  1.31738      1.31738       
submission_new_features.csv  2023-01-23 05:13:22  new features                       complete  0.69366      0.69366       
submission.csv               2023-01-23 04:09:07  first raw submission               complete  1.80760      1.80760       

New Score of 1.3173¶

In [45]:
# second attempt of hypertuning using default 
hyperparameters = 'default'
hyperparameter_tune_kwargs = 'auto'

predictor_new_hpo_2 = TabularPredictor(label="count").fit(train_data=train_new_hpo, time_limit=600, presets="best_quality", hyperparameters=hyperparameters, hyperparameter_tune_kwargs=hyperparameter_tune_kwargs)
No model was trained during hyperparameter tuning NeuralNetTorch_BAG_L2... Skipping this model.
Fitting model: LightGBMLarge_BAG_L2 ... Training model for up to 28.54s of the 71.13s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-33.5697	 = Validation score   (-root_mean_squared_error)
	32.43s	 = Training   runtime
	0.22s	 = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the 35.29s of remaining time.
	-32.4438	 = Validation score   (-root_mean_squared_error)
	0.52s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 565.42s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20230124_044035/")
In [46]:
predictor_new_hpo_2.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
                     model   score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0      WeightedEnsemble_L3  -32.443774       2.589879  365.024152                0.000785           0.522403            3       True         18
1      WeightedEnsemble_L2  -32.653702       2.588854  199.551741                0.000812           0.460111            2       True         10
2     ExtraTreesMSE_BAG_L2  -32.681000       2.588577  248.454099                0.000148          11.326929            2       True         15
3       LightGBM_BAG_L2/T1  -32.869428       2.588549  259.950361                0.000120          22.823192            2       True         12
4   RandomForestMSE_BAG_L2  -32.944686       2.588613  269.409279                0.000184          32.282109            2       True         13
5       CatBoost_BAG_L2/T1  -33.020736       2.588523  271.446168                0.000094          34.318999            2       True         14
6        XGBoost_BAG_L2/T1  -33.070961       2.588547  263.750520                0.000119          26.623350            2       True         16
7     LightGBMLarge_BAG_L2  -33.569674       2.804723  269.557356                0.216294          32.430187            2       True         17
8     LightGBMXT_BAG_L2/T1  -33.737242       2.588577  260.944040                0.000149          23.816871            2       True         11
9     LightGBMLarge_BAG_L1  -34.085048       2.587255   41.989262                2.587255          41.989262            1       True          9
10      LightGBM_BAG_L1/T1  -34.439520       0.000121   46.736726                0.000121          46.736726            1       True          4
11    LightGBMXT_BAG_L1/T1  -34.929700       0.000129   55.958177                0.000129          55.958177            1       True          3
12       XGBoost_BAG_L1/T1  -35.840374       0.000089   33.858625                0.000089          33.858625            1       True          8
13    ExtraTreesMSE_BAG_L1  -38.938443       0.000170    8.325169                0.000170           8.325169            1       True          7
14  RandomForestMSE_BAG_L1  -38.987462       0.000279   12.223671                0.000279          12.223671            1       True          5
15      CatBoost_BAG_L1/T1  -40.977680       0.000129   37.359621                0.000129          37.359621            1       True          6
16   KNeighborsDist_BAG_L1 -115.038459       0.000122    0.316092                0.000122           0.316092            1       True          2
17   KNeighborsUnif_BAG_L1 -119.978810       0.000135    0.359826                0.000135           0.359826            1       True          1
Number of models trained: 18
Types of models trained:
{'StackerEnsembleModel_KNN', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_XGBoost', 'WeightedEnsembleModel', 'StackerEnsembleModel_XT'}
Bagging used: True  (with 8 folds)
Multi-layer stack-ensembling used: True  (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', [])  : 2 | ['season', 'weather']
('float', [])     : 3 | ['temp', 'atemp', 'windspeed']
('int', [])       : 5 | ['humidity', 'hour', 'month', 'day', 'weekday']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
Plot summary of models saved to file: AutogluonModels/ag-20230124_044035/SummaryOfModels.html
*** End of fit() summary ***
Out[46]:
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
  'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
  'LightGBMXT_BAG_L1/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T1': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L1/T1': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
  'XGBoost_BAG_L1/T1': 'StackerEnsembleModel_XGBoost',
  'LightGBMLarge_BAG_L1': 'StackerEnsembleModel_LGB',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel',
  'LightGBMXT_BAG_L2/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2/T1': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L2/T1': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT',
  'XGBoost_BAG_L2/T1': 'StackerEnsembleModel_XGBoost',
  'LightGBMLarge_BAG_L2': 'StackerEnsembleModel_LGB',
  'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
 'model_performance': {'KNeighborsUnif_BAG_L1': -119.97880966975461,
  'KNeighborsDist_BAG_L1': -115.038459148802,
  'LightGBMXT_BAG_L1/T1': -34.9297003401567,
  'LightGBM_BAG_L1/T1': -34.43952035710387,
  'RandomForestMSE_BAG_L1': -38.987461831485355,
  'CatBoost_BAG_L1/T1': -40.97768035938449,
  'ExtraTreesMSE_BAG_L1': -38.9384425957686,
  'XGBoost_BAG_L1/T1': -35.840374362701326,
  'LightGBMLarge_BAG_L1': -34.085047986742616,
  'WeightedEnsemble_L2': -32.65370241912103,
  'LightGBMXT_BAG_L2/T1': -33.73724153874836,
  'LightGBM_BAG_L2/T1': -32.869428476139554,
  'RandomForestMSE_BAG_L2': -32.944685579678655,
  'CatBoost_BAG_L2/T1': -33.02073563789231,
  'ExtraTreesMSE_BAG_L2': -32.68099996467634,
  'XGBoost_BAG_L2/T1': -33.07096072504919,
  'LightGBMLarge_BAG_L2': -33.56967448130482,
  'WeightedEnsemble_L3': -32.4437742724192},
 'model_best': 'WeightedEnsemble_L3',
 'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20230124_044035/models/KNeighborsUnif_BAG_L1/',
  'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20230124_044035/models/KNeighborsDist_BAG_L1/',
  'LightGBMXT_BAG_L1/T1': '/root/AutogluonModels/ag-20230124_044035/models/LightGBMXT_BAG_L1/T1/',
  'LightGBM_BAG_L1/T1': '/root/AutogluonModels/ag-20230124_044035/models/LightGBM_BAG_L1/T1/',
  'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20230124_044035/models/RandomForestMSE_BAG_L1/',
  'CatBoost_BAG_L1/T1': '/root/AutogluonModels/ag-20230124_044035/models/CatBoost_BAG_L1/T1/',
  'ExtraTreesMSE_BAG_L1': 'AutogluonModels/ag-20230124_044035/models/ExtraTreesMSE_BAG_L1/',
  'XGBoost_BAG_L1/T1': '/root/AutogluonModels/ag-20230124_044035/models/XGBoost_BAG_L1/T1/',
  'LightGBMLarge_BAG_L1': 'AutogluonModels/ag-20230124_044035/models/LightGBMLarge_BAG_L1/',
  'WeightedEnsemble_L2': 'AutogluonModels/ag-20230124_044035/models/WeightedEnsemble_L2/',
  'LightGBMXT_BAG_L2/T1': '/root/AutogluonModels/ag-20230124_044035/models/LightGBMXT_BAG_L2/T1/',
  'LightGBM_BAG_L2/T1': '/root/AutogluonModels/ag-20230124_044035/models/LightGBM_BAG_L2/T1/',
  'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20230124_044035/models/RandomForestMSE_BAG_L2/',
  'CatBoost_BAG_L2/T1': '/root/AutogluonModels/ag-20230124_044035/models/CatBoost_BAG_L2/T1/',
  'ExtraTreesMSE_BAG_L2': 'AutogluonModels/ag-20230124_044035/models/ExtraTreesMSE_BAG_L2/',
  'XGBoost_BAG_L2/T1': '/root/AutogluonModels/ag-20230124_044035/models/XGBoost_BAG_L2/T1/',
  'LightGBMLarge_BAG_L2': 'AutogluonModels/ag-20230124_044035/models/LightGBMLarge_BAG_L2/',
  'WeightedEnsemble_L3': 'AutogluonModels/ag-20230124_044035/models/WeightedEnsemble_L3/'},
 'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.35982608795166016,
  'KNeighborsDist_BAG_L1': 0.31609177589416504,
  'LightGBMXT_BAG_L1/T1': 55.95817732810974,
  'LightGBM_BAG_L1/T1': 46.73672556877136,
  'RandomForestMSE_BAG_L1': 12.223671197891235,
  'CatBoost_BAG_L1/T1': 37.35962128639221,
  'ExtraTreesMSE_BAG_L1': 8.325169086456299,
  'XGBoost_BAG_L1/T1': 33.858625173568726,
  'LightGBMLarge_BAG_L1': 41.989262104034424,
  'WeightedEnsemble_L2': 0.4601109027862549,
  'LightGBMXT_BAG_L2/T1': 23.81687068939209,
  'LightGBM_BAG_L2/T1': 22.82319164276123,
  'RandomForestMSE_BAG_L2': 32.28210926055908,
  'CatBoost_BAG_L2/T1': 34.31899881362915,
  'ExtraTreesMSE_BAG_L2': 11.326929092407227,
  'XGBoost_BAG_L2/T1': 26.623350381851196,
  'LightGBMLarge_BAG_L2': 32.43018651008606,
  'WeightedEnsemble_L3': 0.5224027633666992},
 'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.0001347064971923828,
  'KNeighborsDist_BAG_L1': 0.0001220703125,
  'LightGBMXT_BAG_L1/T1': 0.0001285076141357422,
  'LightGBM_BAG_L1/T1': 0.00012111663818359375,
  'RandomForestMSE_BAG_L1': 0.00027871131896972656,
  'CatBoost_BAG_L1/T1': 0.00012946128845214844,
  'ExtraTreesMSE_BAG_L1': 0.0001697540283203125,
  'XGBoost_BAG_L1/T1': 8.893013000488281e-05,
  'LightGBMLarge_BAG_L1': 2.5872554779052734,
  'WeightedEnsemble_L2': 0.0008118152618408203,
  'LightGBMXT_BAG_L2/T1': 0.00014853477478027344,
  'LightGBM_BAG_L2/T1': 0.00011992454528808594,
  'RandomForestMSE_BAG_L2': 0.0001838207244873047,
  'CatBoost_BAG_L2/T1': 9.441375732421875e-05,
  'ExtraTreesMSE_BAG_L2': 0.00014829635620117188,
  'XGBoost_BAG_L2/T1': 0.00011873245239257812,
  'LightGBMLarge_BAG_L2': 0.2162938117980957,
  'WeightedEnsemble_L3': 0.0007851123809814453},
 'num_bag_folds': 8,
 'max_stack_level': 3,
 'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'KNeighborsDist_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'LightGBMXT_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'XGBoost_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMLarge_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L2': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'XGBoost_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMLarge_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L3': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True}},
 'leaderboard':                      model   score_val  pred_time_val    fit_time  \
 0      WeightedEnsemble_L3  -32.443774       2.589879  365.024152   
 1      WeightedEnsemble_L2  -32.653702       2.588854  199.551741   
 2     ExtraTreesMSE_BAG_L2  -32.681000       2.588577  248.454099   
 3       LightGBM_BAG_L2/T1  -32.869428       2.588549  259.950361   
 4   RandomForestMSE_BAG_L2  -32.944686       2.588613  269.409279   
 5       CatBoost_BAG_L2/T1  -33.020736       2.588523  271.446168   
 6        XGBoost_BAG_L2/T1  -33.070961       2.588547  263.750520   
 7     LightGBMLarge_BAG_L2  -33.569674       2.804723  269.557356   
 8     LightGBMXT_BAG_L2/T1  -33.737242       2.588577  260.944040   
 9     LightGBMLarge_BAG_L1  -34.085048       2.587255   41.989262   
 10      LightGBM_BAG_L1/T1  -34.439520       0.000121   46.736726   
 11    LightGBMXT_BAG_L1/T1  -34.929700       0.000129   55.958177   
 12       XGBoost_BAG_L1/T1  -35.840374       0.000089   33.858625   
 13    ExtraTreesMSE_BAG_L1  -38.938443       0.000170    8.325169   
 14  RandomForestMSE_BAG_L1  -38.987462       0.000279   12.223671   
 15      CatBoost_BAG_L1/T1  -40.977680       0.000129   37.359621   
 16   KNeighborsDist_BAG_L1 -115.038459       0.000122    0.316092   
 17   KNeighborsUnif_BAG_L1 -119.978810       0.000135    0.359826   
 
     pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  \
 0                 0.000785           0.522403            3       True   
 1                 0.000812           0.460111            2       True   
 2                 0.000148          11.326929            2       True   
 3                 0.000120          22.823192            2       True   
 4                 0.000184          32.282109            2       True   
 5                 0.000094          34.318999            2       True   
 6                 0.000119          26.623350            2       True   
 7                 0.216294          32.430187            2       True   
 8                 0.000149          23.816871            2       True   
 9                 2.587255          41.989262            1       True   
 10                0.000121          46.736726            1       True   
 11                0.000129          55.958177            1       True   
 12                0.000089          33.858625            1       True   
 13                0.000170           8.325169            1       True   
 14                0.000279          12.223671            1       True   
 15                0.000129          37.359621            1       True   
 16                0.000122           0.316092            1       True   
 17                0.000135           0.359826            1       True   
 
     fit_order  
 0          18  
 1          10  
 2          15  
 3          12  
 4          13  
 5          14  
 6          16  
 7          17  
 8          11  
 9           9  
 10          4  
 11          3  
 12          8  
 13          7  
 14          5  
 15          6  
 16          2  
 17          1  }
In [47]:
predictor_new_hpo_2.leaderboard(silent=True).plot(kind="bar", x="model", y="score_val")
Out[47]:
<AxesSubplot:xlabel='model'>
In [48]:
test["count"] = 0
performance_new_hpo_2 = predictor_new_hpo_2.evaluate(test)
print("The performance indicators are : \n", performance_new_hpo_2)
/usr/local/lib/python3.8/dist-packages/scipy/stats/stats.py:4023: PearsonRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  warnings.warn(PearsonRConstantInputWarning())
Evaluation: root_mean_squared_error on test data: -257.28843951366616
	Note: Scores are always higher_is_better. This metric score can be multiplied by -1 to get the metric value.
Evaluations on test data:
{
    "root_mean_squared_error": -257.28843951366616,
    "mean_squared_error": -66197.34110737746,
    "mean_absolute_error": -189.76286823456155,
    "r2": 0.0,
    "pearsonr": NaN,
    "median_absolute_error": -147.41567993164062
}
The performance indicators are : 
 {'root_mean_squared_error': -257.28843951366616, 'mean_squared_error': -66197.34110737746, 'mean_absolute_error': -189.76286823456155, 'r2': 0.0, 'pearsonr': nan, 'median_absolute_error': -147.41567993164062}
In [49]:
# Remember to set all negative values to zero
predictions_new_hpo_2 = predictor_new_hpo_2.predict(test)
print((predictions_new_hpo_2 < 0).sum())
predictions_new_hpo_2[predictions_new_hpo_2<0] = 0
predictions_new_hpo_2.describe()
0
Out[49]:
count    6493.000000
mean      189.762863
std       173.758575
min         3.497172
25%        45.154335
50%       147.415680
75%       278.535583
max       904.307739
Name: count, dtype: float64
In [50]:
# Same submitting predictions
submission_new_hpo_2 = pd.read_csv('submission.csv')
submission_new_hpo_2["count"] = predictions_new_hpo_2
submission_new_hpo_2.to_csv("submission_new_hpo_2.csv", index=False)
In [51]:
!kaggle competitions submit -c bike-sharing-demand -f submission_new_hpo_2.csv -m "new features with hyperparameters 2"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 341kB/s]
Successfully submitted to Bike Sharing Demand
In [52]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName                     date                 description                          status    publicScore  privateScore  
---------------------------  -------------------  -----------------------------------  --------  -----------  ------------  
submission_new_hpo_2.csv     2023-01-24 04:58:04  new features with hyperparameters 2  complete  0.44585      0.44585       
submission_new_features.csv  2023-01-24 04:22:35  new features 2                       complete  0.69366      0.69366       
submission_new_hpo.csv       2023-01-23 15:19:10  new features with hyperparameters    complete  1.31738      1.31738       
submission_new_features.csv  2023-01-23 05:13:22  new features                         complete  0.69366      0.69366       

Step 7: Write a Report¶

Refer to the markdown file for the full report¶

Creating plots and table for report¶

In [73]:
# Taking the top model score from each training run and creating a line plot to show improvement
# You can create these in the notebook and save them to PNG or use some other tool (e.g. google sheets, excel)
fig = pd.DataFrame(
    {
        "model": ["initial", "add_features", "hpo", "add features 2", "hpo 2"],
        "score": [-52.885174, -30.024358, -132.758746, -32.3882,-32.443774]
    }
).plot(x="model", y="score", figsize=(8, 6)).get_figure()
fig.savefig('model_train_score_2.png')
In [53]:
# Take the 3 kaggle scores and creating a line plot to show improvement
fig = pd.DataFrame(
    {
        "test_eval": ["initial", "add_features", "hpo","add features 2", "hpo 2"],
        "score": [1.80760,0.69366,1.31738,0.69366,0.4458]
    }
).plot(x="test_eval", y="score", figsize=(8, 6)).get_figure()
fig.savefig('model_test_score_2.png')

Hyperparameter table¶

In [72]:
# The 3 hyperparameters we tuned with the kaggle score as the result
pd.DataFrame({
    "model": ["initial", "add_features", "hpo","add features 2", "hpo 2"],
    "time_limit": [600, 600, 600, 600, 600],
    "presets": ["best_quality", "best_quality", "best_quality", "best_quality", "best_quality"],
    "hyperparameters": ['default','default', "{'GBM: "+ str(gbm_options)+"}, {NN: "+ str(nn_options)+"}", 'default','default'],
    "hyperparameter_tune_kwargs":["-", "-", "auto","-", "{'searcher':'auto'}"],
    "score": [1.80760,0.69366,1.31738, 0.69366, 0.4458]
})
Out[72]:
model time_limit presets hyperparameters hyperparameter_tune_kwargs score
0 initial 600 best_quality default - 1.80760
1 add_features 600 best_quality default - 0.69366
2 hpo 600 best_quality {'GBM: {'num_boost_round': 100, 'num_leaves': Int: lower=26, upper=66}}, {NN: {'num_epochs': 10, 'learning_rate': Real: lower=0.0001, upper=0.01, 'activation': Categorical['relu', 'softrelu', 'tanh'], 'layers': Categorical[[100], [1000], [200, 100], [300, 200, 100]], 'dropout_prob': Real: lower=0.0, upper=0.5}} auto 1.31738
3 add features 2 600 best_quality default - 0.69366
4 hpo 2 600 best_quality default {'searcher':'auto'} 0.44580
In [98]:
!tar --version
tar (GNU tar) 1.30
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.
In [99]:
ls
AutogluonModels/             submission.csv
bike-sharing-demand.zip      submission_new_features.csv
cd0385-project-starter/      submission_new_features_2.csv
histogram_hours_feature.png  submission_new_hpo.csv
model_test_score.png         submission_new_hpo_2.csv
model_test_score_2.png       test.csv
model_train_score.png        train.csv
model_train_score_2.png      training_runs.png
sampleSubmission.csv
In [103]:
!tar --exclude='AutogluonModels' --exclude='./.*' -zcvf backup.tar.gz .
./
./sampleSubmission.csv
./train.csv
./histogram_hours_feature.png
./model_train_score.png
./backup.tar.gz
./test.csv
./training_runs.png
./submission.csv
./submission_new_hpo_2.csv
./model_test_score.png
./submission_new_features_2.csv
./submission_new_hpo.csv
./model_train_score_2.png
./submission_new_features.csv
./cd0385-project-starter/
./cd0385-project-starter/.git/
./cd0385-project-starter/.git/refs/
./cd0385-project-starter/.git/refs/heads/
./cd0385-project-starter/.git/refs/heads/main
./cd0385-project-starter/.git/refs/remotes/
./cd0385-project-starter/.git/refs/remotes/origin/
./cd0385-project-starter/.git/refs/remotes/origin/HEAD
./cd0385-project-starter/.git/refs/tags/
./cd0385-project-starter/.git/index
./cd0385-project-starter/.git/hooks/
./cd0385-project-starter/.git/hooks/pre-push.sample
./cd0385-project-starter/.git/hooks/pre-merge-commit.sample
./cd0385-project-starter/.git/hooks/pre-applypatch.sample
./cd0385-project-starter/.git/hooks/applypatch-msg.sample
./cd0385-project-starter/.git/hooks/post-update.sample
./cd0385-project-starter/.git/hooks/pre-rebase.sample
./cd0385-project-starter/.git/hooks/fsmonitor-watchman.sample
./cd0385-project-starter/.git/hooks/prepare-commit-msg.sample
./cd0385-project-starter/.git/hooks/commit-msg.sample
./cd0385-project-starter/.git/hooks/pre-commit.sample
./cd0385-project-starter/.git/hooks/push-to-checkout.sample
./cd0385-project-starter/.git/hooks/update.sample
./cd0385-project-starter/.git/hooks/pre-receive.sample
./cd0385-project-starter/.git/description
./cd0385-project-starter/.git/info/
./cd0385-project-starter/.git/info/exclude
./cd0385-project-starter/.git/HEAD
./cd0385-project-starter/.git/objects/
./cd0385-project-starter/.git/objects/info/
./cd0385-project-starter/.git/objects/pack/
./cd0385-project-starter/.git/objects/pack/pack-eadf976caa534391b6423e05e4c7e0705fcccd87.idx
./cd0385-project-starter/.git/objects/pack/pack-eadf976caa534391b6423e05e4c7e0705fcccd87.pack
./cd0385-project-starter/.git/config
./cd0385-project-starter/.git/branches/
./cd0385-project-starter/.git/logs/
./cd0385-project-starter/.git/logs/refs/
./cd0385-project-starter/.git/logs/refs/heads/
./cd0385-project-starter/.git/logs/refs/heads/main
./cd0385-project-starter/.git/logs/refs/remotes/
./cd0385-project-starter/.git/logs/refs/remotes/origin/
./cd0385-project-starter/.git/logs/refs/remotes/origin/HEAD
./cd0385-project-starter/.git/logs/HEAD
./cd0385-project-starter/.git/packed-refs
./cd0385-project-starter/.ipynb_checkpoints/
./cd0385-project-starter/.ipynb_checkpoints/README-checkpoint.md
./cd0385-project-starter/.github/
./cd0385-project-starter/.github/workflows/
./cd0385-project-starter/.github/workflows/manual.yml
./cd0385-project-starter/CODEOWNERS
./cd0385-project-starter/README.md
./cd0385-project-starter/project/
./cd0385-project-starter/project/project-template.ipynb
./cd0385-project-starter/project/report-template.md
./cd0385-project-starter/project/.ipynb_checkpoints/
./cd0385-project-starter/project/.ipynb_checkpoints/README-checkpoint.md
./cd0385-project-starter/project/.ipynb_checkpoints/report-template-checkpoint.md
./cd0385-project-starter/project/.ipynb_checkpoints/project-template-checkpoint.ipynb
./cd0385-project-starter/project/img/
./cd0385-project-starter/project/img/model_train_score.png
./cd0385-project-starter/project/img/sagemaker-studio-git1.png
./cd0385-project-starter/project/img/model_test_score.png
./cd0385-project-starter/project/img/sagemaker-studio-git2.png
./cd0385-project-starter/project/README.md
./bike-sharing-demand.zip
./model_test_score_2.png
tar: .: file changed as we read it
In [84]:
!tar --help
Usage: tar [OPTION...] [FILE]...
GNU 'tar' saves many files together into a single tape or disk archive, and can
restore individual files from the archive.

Examples:
  tar -cf archive.tar foo bar  # Create archive.tar from files foo and bar.
  tar -tvf archive.tar         # List all files in archive.tar verbosely.
  tar -xf archive.tar          # Extract all files from archive.tar.

 Local file name selection:

      --add-file=FILE        add given FILE to the archive (useful if its name
                             starts with a dash)
  -C, --directory=DIR        change to directory DIR
      --exclude=PATTERN      exclude files, given as a PATTERN
      --exclude-backups      exclude backup and lock files
      --exclude-caches       exclude contents of directories containing
                             CACHEDIR.TAG, except for the tag file itself
      --exclude-caches-all   exclude directories containing CACHEDIR.TAG
      --exclude-caches-under exclude everything under directories containing
                             CACHEDIR.TAG
      --exclude-ignore=FILE  read exclude patterns for each directory from
                             FILE, if it exists
      --exclude-ignore-recursive=FILE
                             read exclude patterns for each directory and its
                             subdirectories from FILE, if it exists
      --exclude-tag=FILE     exclude contents of directories containing FILE,
                             except for FILE itself
      --exclude-tag-all=FILE exclude directories containing FILE
      --exclude-tag-under=FILE   exclude everything under directories
                             containing FILE
      --exclude-vcs          exclude version control system directories
      --exclude-vcs-ignores  read exclude patterns from the VCS ignore files
      --no-null              disable the effect of the previous --null option
      --no-recursion         avoid descending automatically in directories
      --no-unquote           do not unquote input file or member names
      --no-verbatim-files-from   -T treats file names starting with dash as
                             options (default)
      --null                 -T reads null-terminated names; implies
                             --verbatim-files-from
      --recursion            recurse into directories (default)
  -T, --files-from=FILE      get names to extract or create from FILE
      --unquote              unquote input file or member names (default)
      --verbatim-files-from  -T reads file names verbatim (no escape or option
                             handling)
  -X, --exclude-from=FILE    exclude patterns listed in FILE

 File name matching options (affect both exclude and include patterns):

      --anchored             patterns match file name start
      --ignore-case          ignore case
      --no-anchored          patterns match after any '/' (default for
                             exclusion)
      --no-ignore-case       case sensitive matching (default)
      --no-wildcards         verbatim string matching
      --no-wildcards-match-slash   wildcards do not match '/'
      --wildcards            use wildcards (default for exclusion)
      --wildcards-match-slash   wildcards match '/' (default for exclusion)

 Main operation mode:

  -A, --catenate, --concatenate   append tar files to an archive
  -c, --create               create a new archive
  -d, --diff, --compare      find differences between archive and file system
      --delete               delete from the archive (not on mag tapes!)
  -r, --append               append files to the end of an archive
  -t, --list                 list the contents of an archive
      --test-label           test the archive volume label and exit
  -u, --update               only append files newer than copy in archive
  -x, --extract, --get       extract files from an archive

 Operation modifiers:

      --check-device         check device numbers when creating incremental
                             archives (default)
  -g, --listed-incremental=FILE   handle new GNU-format incremental backup
  -G, --incremental          handle old GNU-format incremental backup
      --hole-detection=TYPE  technique to detect holes
      --ignore-failed-read   do not exit with nonzero on unreadable files
      --level=NUMBER         dump level for created listed-incremental archive
  -n, --seek                 archive is seekable
      --no-check-device      do not check device numbers when creating
                             incremental archives
      --no-seek              archive is not seekable
      --occurrence[=NUMBER]  process only the NUMBERth occurrence of each file
                             in the archive; this option is valid only in
                             conjunction with one of the subcommands --delete,
                             --diff, --extract or --list and when a list of
                             files is given either on the command line or via
                             the -T option; NUMBER defaults to 1
      --sparse-version=MAJOR[.MINOR]
                             set version of the sparse format to use (implies
                             --sparse)
  -S, --sparse               handle sparse files efficiently

 Overwrite control:

  -k, --keep-old-files       don't replace existing files when extracting,
                             treat them as errors
      --keep-directory-symlink   preserve existing symlinks to directories when
                             extracting
      --keep-newer-files     don't replace existing files that are newer than
                             their archive copies
      --no-overwrite-dir     preserve metadata of existing directories
      --one-top-level[=DIR]  create a subdirectory to avoid having loose files
                             extracted
      --overwrite            overwrite existing files when extracting
      --overwrite-dir        overwrite metadata of existing directories when
                             extracting (default)
      --recursive-unlink     empty hierarchies prior to extracting directory
      --remove-files         remove files after adding them to the archive
      --skip-old-files       don't replace existing files when extracting,
                             silently skip over them
  -U, --unlink-first         remove each file prior to extracting over it
  -W, --verify               attempt to verify the archive after writing it

 Select output stream:

      --ignore-command-error ignore exit codes of children
      --no-ignore-command-error   treat non-zero exit codes of children as
                             error
  -O, --to-stdout            extract files to standard output
      --to-command=COMMAND   pipe extracted files to another program

 Handling of file attributes:

      --atime-preserve[=METHOD]   preserve access times on dumped files, either
                             by restoring the times after reading
                             (METHOD='replace'; default) or by not setting the
                             times in the first place (METHOD='system')
      --clamp-mtime          only set time when the file is more recent than
                             what was given with --mtime
      --delay-directory-restore   delay setting modification times and
                             permissions of extracted directories until the end
                             of extraction
      --group=NAME           force NAME as group for added files
      --group-map=FILE       use FILE to map file owner GIDs and names
      --mode=CHANGES         force (symbolic) mode CHANGES for added files
      --mtime=DATE-OR-FILE   set mtime for added files from DATE-OR-FILE
  -m, --touch                don't extract file modified time
      --no-delay-directory-restore
                             cancel the effect of --delay-directory-restore
                             option
      --no-same-owner        extract files as yourself (default for ordinary
                             users)
      --no-same-permissions  apply the user's umask when extracting permissions
                             from the archive (default for ordinary users)
      --numeric-owner        always use numbers for user/group names
      --owner=NAME           force NAME as owner for added files
      --owner-map=FILE       use FILE to map file owner UIDs and names
  -p, --preserve-permissions, --same-permissions
                             extract information about file permissions
                             (default for superuser)
      --same-owner           try extracting files with the same ownership as
                             exists in the archive (default for superuser)
  -s, --preserve-order, --same-order
                             member arguments are listed in the same order as
                             the files in the archive
      --sort=ORDER           directory sorting order: none (default), name or
                             inode

 Handling of extended file attributes:

      --acls                 Enable the POSIX ACLs support
      --no-acls              Disable the POSIX ACLs support
      --no-selinux           Disable the SELinux context support
      --no-xattrs            Disable extended attributes support
      --selinux              Enable the SELinux context support
      --xattrs               Enable extended attributes support
      --xattrs-exclude=MASK  specify the exclude pattern for xattr keys
      --xattrs-include=MASK  specify the include pattern for xattr keys

 Device selection and switching:

  -f, --file=ARCHIVE         use archive file or device ARCHIVE
      --force-local          archive file is local even if it has a colon
  -F, --info-script=NAME, --new-volume-script=NAME
                             run script at end of each tape (implies -M)
  -L, --tape-length=NUMBER   change tape after writing NUMBER x 1024 bytes
  -M, --multi-volume         create/list/extract multi-volume archive
      --rmt-command=COMMAND  use given rmt COMMAND instead of rmt
      --rsh-command=COMMAND  use remote COMMAND instead of rsh
      --volno-file=FILE      use/update the volume number in FILE

 Device blocking:

  -b, --blocking-factor=BLOCKS   BLOCKS x 512 bytes per record
  -B, --read-full-records    reblock as we read (for 4.2BSD pipes)
  -i, --ignore-zeros         ignore zeroed blocks in archive (means EOF)
      --record-size=NUMBER   NUMBER of bytes per record, multiple of 512

 Archive format selection:

  -H, --format=FORMAT        create archive of the given format

 FORMAT is one of the following:

    gnu                      GNU tar 1.13.x format
    oldgnu                   GNU format as per tar <= 1.12
    pax                      POSIX 1003.1-2001 (pax) format
    posix                    same as pax
    ustar                    POSIX 1003.1-1988 (ustar) format
    v7                       old V7 tar format

      --old-archive, --portability
                             same as --format=v7
      --pax-option=keyword[[:]=value][,keyword[[:]=value]]...
                             control pax keywords
      --posix                same as --format=posix
  -V, --label=TEXT           create archive with volume name TEXT; at
                             list/extract time, use TEXT as a globbing pattern
                             for volume name

 Compression options:

  -a, --auto-compress        use archive suffix to determine the compression
                             program
  -I, --use-compress-program=PROG
                             filter through PROG (must accept -d)
  -j, --bzip2                filter the archive through bzip2
  -J, --xz                   filter the archive through xz
      --lzip                 filter the archive through lzip
      --lzma                 filter the archive through xz
      --lzop                 filter the archive through lzop
      --no-auto-compress     do not use archive suffix to determine the
                             compression program
  -z, --gzip, --gunzip, --ungzip   filter the archive through gzip
      --zstd                 filter the archive through zstd
  -Z, --compress, --uncompress   filter the archive through compress

 Local file selection:

      --backup[=CONTROL]     backup before removal, choose version CONTROL
  -h, --dereference          follow symlinks; archive and dump the files they
                             point to
      --hard-dereference     follow hard links; archive and dump the files they
                             refer to
  -K, --starting-file=MEMBER-NAME
                             begin at member MEMBER-NAME when reading the
                             archive
      --newer-mtime=DATE     compare date and time when data changed only
  -N, --newer=DATE-OR-FILE, --after-date=DATE-OR-FILE
                             only store files newer than DATE-OR-FILE
      --one-file-system      stay in local file system when creating archive
  -P, --absolute-names       don't strip leading '/'s from file names
      --suffix=STRING        backup before removal, override usual suffix ('~'
                             unless overridden by environment variable
                             SIMPLE_BACKUP_SUFFIX)

 File name transformations:

      --strip-components=NUMBER   strip NUMBER leading components from file
                             names on extraction
      --transform=EXPRESSION, --xform=EXPRESSION
                             use sed replace EXPRESSION to transform file
                             names

 Informative output:

      --checkpoint[=NUMBER]  display progress messages every NUMBERth record
                             (default 10)
      --checkpoint-action=ACTION   execute ACTION on each checkpoint
      --full-time            print file time to its full resolution
      --index-file=FILE      send verbose output to FILE
  -l, --check-links          print a message if not all links are dumped
      --no-quote-chars=STRING   disable quoting for characters from STRING
      --quote-chars=STRING   additionally quote characters from STRING
      --quoting-style=STYLE  set name quoting style; see below for valid STYLE
                             values
  -R, --block-number         show block number within archive with each message
                            
      --show-defaults        show tar defaults
      --show-omitted-dirs    when listing or extracting, list each directory
                             that does not match search criteria
      --show-snapshot-field-ranges
                             show valid ranges for snapshot-file fields
      --show-transformed-names, --show-stored-names
                             show file or archive names after transformation
      --totals[=SIGNAL]      print total bytes after processing the archive;
                             with an argument - print total bytes when this
                             SIGNAL is delivered; Allowed signals are: SIGHUP,
                             SIGQUIT, SIGINT, SIGUSR1 and SIGUSR2; the names
                             without SIG prefix are also accepted
      --utc                  print file modification times in UTC
  -v, --verbose              verbosely list files processed
      --warning=KEYWORD      warning control
  -w, --interactive, --confirmation
                             ask for confirmation for every action

 Compatibility options:

  -o                         when creating, same as --old-archive; when
                             extracting, same as --no-same-owner

 Other options:

  -?, --help                 give this help list
      --restrict             disable use of some potentially harmful options
      --usage                give a short usage message
      --version              print program version

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.

The backup suffix is '~', unless set with --suffix or SIMPLE_BACKUP_SUFFIX.
The version control may be set with --backup or VERSION_CONTROL, values are:

  none, off       never make backups
  t, numbered     make numbered backups
  nil, existing   numbered if numbered backups exist, simple otherwise
  never, simple   always make simple backups

Valid arguments for the --quoting-style option are:

  literal
  shell
  shell-always
  shell-escape
  shell-escape-always
  c
  c-maybe
  escape
  locale
  clocale

*This* tar defaults to:
--format=gnu -f- -b20 --quoting-style=escape --rmt-command=/usr/sbin/rmt
--rsh-command=/usr/bin/rsh
In [ ]: